-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsimulationExercise.Rmd
94 lines (68 loc) · 3.99 KB
/
simulationExercise.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "Simulation Exercise"
author: "Hailu Teju"
date: "September 16, 2017"
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Simulation - The Exponential Distribution
For the first part of this project, we investigate the exponential distribution using R. We assume that the exponential distribution has rate parameter, $\lambda = 0.2.$ For the exponential distribution with rate parameter $\lambda$, both its theoretical mean and standard deviation are equal to $1/\lambda.$ So here we have:
$$\mu = \sigma = \frac{1}{\lambda} = \frac{1}{0.2} = 5. \quad \textrm{Thus,}\; \verb+variance+ = \sigma^2=25.$$
To this end, we simulate 1000 averages of 40 exponentials with rate parameter $\lambda = 0.2$ using R's builtin function $\fbox{rexp(n, rate) },$ and then we observe how the Law of Large Numbers and the Central Limit Theorem work.
* The Law of Large Numbers (LLN) states that the average limits to what it is estimating, and
* The Central Limit Theorem (CLT) states that the distribution of the averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases.
Below, we creates 1000 simulations of 40 exponentials with rate parameter 0.2 and show that the averages of the means approach the theoretical mean $\mu = 5$ and the averages of the variance approach the theoretical variance $\sigma^2 =25.$
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(ggplot2)
library(datasets)
data(ToothGrowth)
```
```{r simulate_exp_dist, fig.width=9, fig.height = 6, fig.align='center'}
nosim <- 1000
x <- matrix(rexp(nosim * 40, 0.2),
nosim)
means <- cumsum(apply(x, 1, mean))/(1:nosim)
vars <- cumsum(apply(x,1,var))/(1:nosim)
plot(means, col="skyblue", pch=1, xlab="Number of samples",
main="Averages of the means of 40 exponentials",
col.main="dodgerblue", col.lab="dodgerblue")
abline(h=5, col="red", lwd=2, lty=2) # comparing means with the theoretical mean mu = 1/lambda = 5
plot(vars, col="yellowgreen", pch=1, xlab="Number of samples",
main="Averages of the variances of 40 exponentials",
col.main="dodgerblue", col.lab="dodgerblue")
abline(h=25, col="red", lwd=2, lty=2)
```
The figure below also shows that when we simulte the means 40 exponentials with parameter $\lambda = 0.2$
one thousand times, we get a distribution that is approximately Gaussian.
```{r}
mns = NULL
for (i in 1:1000) mns = c(mns, mean(rexp(40, 0.2)))
hist(mns, col="wheat", main="1000 Averages of 40 random exponentials",
xlab="Average of 40 random exponential dists.", col.main="dodgerblue", col.lab="dodgerblue")
```
The R code below and the figures generated also confirm the Central Limit Theorem - the more number of independent exponentials with rate parameter $\lambda = 0.2,$ we consider, the closer the distribution of their averages get to that of a standard normal distribution.
```{r, verify_CLT, fig.width=9, fig.height = 6, fig.align='center'}
nosim <- 1000
cfunc <- function(x, n) sqrt(n) * (mean(x) - 5) / 5
dat <- data.frame(
x = c(apply(matrix(rexp(nosim * 40, 0.2),
nosim), 1, cfunc, 40),
apply(matrix(rexp(nosim * 80, 0.2),
nosim), 1, cfunc, 80),
apply(matrix(rexp(nosim * 200, 0.2),
nosim), 1, cfunc, 200)
),
size = factor(rep(c(40, 80, 200), rep(nosim, 3))))
g <- ggplot(dat, aes(x = x, fill = size)) + geom_histogram(alpha = .20, binwidth=.3, colour = "black",
aes(y = ..density..))
g <- g + stat_function(fun = dnorm, size = 2)
g <- g + facet_grid(. ~ size)
g <- g + ggtitle("Distributions of averages of n exponentials with lambda = 0.2 for n=40, 80, & 200")
g <- g + theme(plot.title=element_text(color="dodgerblue", size=14, hjust=0.5))
g <- g + theme(axis.title = element_text(size=14, color="dodgerblue"))
g
```