Skip to content

Commit

Permalink
some small formatting changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Alex Knudson committed Nov 17, 2020
1 parent a8ad238 commit 24704c7
Show file tree
Hide file tree
Showing 28 changed files with 630 additions and 369 deletions.
49 changes: 2 additions & 47 deletions 020-psychometrics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ library(kableExtra)
This paper focuses on a type of psychometric experiment called a temporal order judgment (TOJ) experiment. If there are two distinct stimuli occurring nearly simultaneously then our brains will bind them into a single percept (perceive them as happening simultaneously). Compensation for small temporal differences is beneficial for coherent multisensory experiences, particularly in visual-speech synthesis as it is necessary to maintain an accurate representation of the sources of multisensory events.


It was Charles Darwin who in his book _On the Origin of Species_ developed the idea that living organisms adapt in order to better survive in their environment. Sir Francis Galton, inspired by Darwin's ideas, became interested in the differences in human beings and in how to measure those differences. Galton's works on studying and measuring human differences lead to the creation of psychometrics -- the science of measuring mental faculties. Around the same time that he was developing his theories, Johann Friedrich Herbart was also interested in studying consciousness through the scientific method, and is responsible for creating mathematical models of the mind.
It was Charles Darwin who in his book "On the Origin of Species" developed the idea that living organisms adapt in order to better survive in their environment. Sir Francis Galton, inspired by Darwin's ideas, became interested in the differences in human beings and in how to measure those differences. Galton's works on studying and measuring human differences lead to the creation of psychometrics -- the science of measuring mental faculties. Around the same time that he was developing his theories, Johann Friedrich Herbart was also interested in studying consciousness through the scientific method, and is responsible for creating mathematical models of the mind.


E.H. Weber built upon Herbart's work, and sought out to prove the idea of a psychological threshold. A psychological threshold is a minimum stimulus intensity necessary to activate a sensory system -- a liminal stimulus. He paved the way for experimental psychology and is the namesake of Weber's Law (Equation \@ref(eq:webers-law)), which states that the change in a stimulus that will be just noticeable is a constant ratio of the original stimulus [@britannica2014editors].
E.H. Weber built upon Herbart's work, and sought out to prove the idea of a psychological threshold. A psychological threshold is a minimum stimulus intensity necessary to activate a sensory system -- a liminal stimulus. He paved the way for experimental psychology and is the namesake of Weber's Law (Equation \@ref(eq:webers-law)), which states that the change in a stimulus that will be just noticeable is a constant ratio of the original stimulus [ekman1959weber].


\begin{equation}
Expand Down Expand Up @@ -278,48 +278,3 @@ audiovisual %>%
theme_minimal() +
theme(legend.position = "none")
```



<!-- I may want to move some of this to Results
When this method of detecting outliers is repeated for all tasks and blocks, then we end up with 17 records in total (figure \@ref(fig:ch020-naive-prop-outliers)), one of which is the aforementioned subject.
```{r ch020-naive-prop-outliers, fig.cap="Proportion of correct responses across all tasks and blocks Proportions are calculated individually for positive and negative SOAs."}
multitask %>%
mutate(is_pos = soa > 0,
is_neg = soa < 0,
resp_pos = response == 1,
resp_neg = response == 0) %>%
filter(is_pos | is_neg, trial %in% c("pre", "post1")) %>%
mutate(correct = (is_pos & resp_pos) | (is_neg & resp_neg)) %>%
group_by(rid, is_pos) %>%
add_count() %>%
summarise(k = sum(correct), p = k / n) %>%
filter(p < 0.5) %>%
distinct() %>%
ggplot(aes(p, rid)) +
geom_point(size = 3) +
geom_vline(xintercept = 0.5, linetype = "dashed") +
labs(title = "Proportion of correct responses",
subtitle = "All tasks and blocks",
x = "Proportion of correct responses",
y = "Record ID") +
annotate("text", x = 0.51, y = 8, label = "Random guessing",
angle = 270, vjust = 0) +
scale_x_continuous(limits = c(0, 0.65), breaks = seq(0, 1, 0.25)) +
theme_minimal() +
theme(legend.position = "none")
```
Most of the records that are flagged by this method of outlier detection are from the sensorimotor task, and none are from the visual task. This may be attributed to the perceived difficulty of the task. One consequence of higher temporal sensitivity is that it is easier to determine temporal order. It may also be that determining temporal order is inherently easier for certain multisensory tasks compared to others. Since the sensorimotor task does not have fixed SOA values like the other tasks, it may be perceived as more difficult. Or perhaps the mechanisms that process tactile and visual signals are not as well coupled as those that process audio and visual signals.
-->






51 changes: 18 additions & 33 deletions 030-methods.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,12 @@ Psychometric functions are commonly fit using generalized linear models which al
Commonly GLMs are fit using maximum likelihood estimation (MLE). The outcome of a single experiment can be represented as the result of a Bernoulli trial. The psychometric function, $F(x; \theta)$, determines the probability that the outcome is 1:


\setstretch{1.0}
\begin{align*}
Y &\sim \textrm{Bernoulli}(\pi) \\
\pi &= P(Y=1 \vert x; \theta) = F(x; \theta)
\end{align*}
\setstretch{2.0}


If $P(Y=1 | x; \theta) = F(x;\theta)$, then $P(Y = 0 | x; \theta) = 1 - F(x;\theta)$, and hence the probability of an outcome is:
Expand Down Expand Up @@ -119,10 +121,12 @@ ggplot(log_lik, aes(ms, ss, fill = `Log Likelihood`)) +
When the separable data from above is fit using `R`'s `glm` function, there is a warning about fitted probabilities being $0$ or $1$, indicating that the slope is very steep.


\setstretch{1.0}
```{r ch030-Rubber Blue, echo=TRUE, warning=TRUE, message=FALSE}
fit <- glm(y ~ x, family = binomial("logit"))
coefficients(fit)
```
\setstretch{2.0}


The coefficients of the linear predictor are in slope-intercept form ($23 + 46 x$). Rearranging the coefficients into location-scale form yields:
Expand Down Expand Up @@ -399,10 +403,12 @@ As the number of parameters in a model grows, it becomes exceedingly tedious to
The initial starting parameters for this model are intentionally set to vary between $-10$ and $10$ -- in contrast to the default range of $(-2, 2)$ -- and with only a few samples drawn in order to artificially drive up the split $\hat{R}$ statistic. The model is provided as supplementary code in the [appendix](#code).


\setstretch{1.0}
```{r ch030-Eager Galaxy, echo=TRUE}
fit_cp <- sampling(schools_cp, data = schools_dat, refresh = 0,
iter = 40, init_r = 10, seed = 671254821)
```
\setstretch{2.0}


`Stan` warns about many different issues with this model, but the R-hat is the one of interest. The largest is $1.71$ which is incredibly large. Gelmen suggests using a threshold of $1.10$ to flag unhealthy chains.
Expand Down Expand Up @@ -466,11 +472,13 @@ where
$$
s_{m}^2 = \frac{1}{N-1}\sum_{n=1}^{N}\left(\theta_{m}^{(n)} - \bar{\theta}_m\right)^2
$$
\setstretch{2.0}


The weighted mixture of the within-chain and cross-chain variation is:


\setstretch{1.0}
$$
\hat{var} = \frac{N-1}{N} W + \frac{1}{N} B
$$
Expand Down Expand Up @@ -510,6 +518,7 @@ var_hat <- W * (N - 1) / N + B / N
The $\hat{R}$ statistic is smaller than the split $\hat{R}$ value provided by `Stan`. This is a consequence of steadily increasing or decreasing chains. The split value does what it sounds like, and splits the samples from the chains in half -- effectively doubling the number of chains and halving the number of samples per chain. In this way, the measure is more robust in detecting unhealthy chains. This also highlights the utility in using both visual and statistical tools to evaluate models. Here is the calculation of the split $\hat{R}$:


\setstretch{1.0}
```{r ch030-Steady Brave Windshield, echo=TRUE}
param <- "mu"
theta_tmp <- p_cp[,,param]
Expand All @@ -528,6 +537,7 @@ var_hat <- W * (N - 1) / N + B / N
(mu_Rhat <- sqrt(var_hat / W))
```
\setstretch{2.0}


We've successfully replicated the calculation of the split $\hat{R}$. @vehtari2020rank propose an improved rank-normalized $\hat{R}$ for assessing the convergence of MCMC chains, and also suggest using a threshold of $1.01$.
Expand All @@ -536,6 +546,7 @@ We've successfully replicated the calculation of the split $\hat{R}$. @vehtari20
**Effective Sample Size.** Samples from Markov Chains are typically autocorrelated, which can increase uncertainty of posterior estimates. The solution is generally to reparameterize the model to avoid steep log-posterior densities. When the HMC algorithm is exploring difficult geometry, it can get stuck in regions of high densities, which means that there is more correlation between successive samples. Equation \@ref(eq:schools-ncp) shows the centered (left) and non-centered (right) parameterization of the 8-Schools model, and the benefit of reparameterization is conveyed by the ratio of effective sample size to actual sample size in figure \@ref(fig:ch030-Timely-Nitrogen).


\setstretch{1.0}
\begin{equation}
\begin{split}
\sigma &\sim \mathcal{U}(0, \infty) \\
Expand All @@ -555,6 +566,7 @@ We've successfully replicated the calculation of the split $\hat{R}$. @vehtari20
\end{split}
(\#eq:schools-ncp)
\end{equation}
\setstretch{2.0}


```{r ch030-Third Tuna}
Expand Down Expand Up @@ -662,7 +674,7 @@ p2
```


If a $6^{th}$ point were to be added -- a new observation -- which of the models would be expected to predict best? Can it be estimated which model will predict best before testing with new data? One guess is that the quadratic or cubic model will do well because because the linear model is potentially _underfit_ to the data and the quartic is _overfit_ to the data. Figure \@ref(fig:ch030-Cold-Fish) shows the new data point from the polynomial model. Now the linear and cubic models are trending in the wrong direction. The quadratic and quartic models are both trending down, so perhaps they may be the correct form for the model.
If a $6^{th}$ point were to be added -- a new observation -- which of the models would be expected to predict best? Can it be estimated which model will predict best before testing with new data? One guess is that the quadratic or cubic model will do well because because the linear model is potentially underfit to the data and the quartic is overfit to the data. Figure \@ref(fig:ch030-Cold-Fish) shows the new data point from the polynomial model. Now the linear and cubic models are trending in the wrong direction. The quadratic and quartic models are both trending down, so perhaps they may be the correct form for the model.


```{r ch030-Cold-Fish, fig.cap="The fitted polynomial models with a new observation."}
Expand All @@ -675,10 +687,12 @@ p2 +
Figure \@ref(fig:ch030-Strawberry-Swallow) shows the 80% and 95% prediction intervals for a new observation given $x = `r x0[i]`$ as well as the true outcome as a dashed line at $y = `r round(y0[i], 3)`$. The linear model has the smallest prediction interval (PI), but completely misses the target. The remaining three models all include the observed value in their 95% PIs, but the quadratic model has the smallest PI of the three. The actual data generating polynomial is


\setstretch{1.0}
\begin{align*}
y &\sim \mathcal{N}(\mu, 1^2) \\
\mu &= -0.5(x - 2)^2 + 2
\end{align*}
\setstretch{2.0}


```{r ch030-Maximum Panther, include=FALSE}
Expand Down Expand Up @@ -850,7 +864,8 @@ comp %>%
"Cubic",
"Quartic"))) %>%
select(Model, elpd_diff, se_diff, p_loo, looic) %>%
kable(digits = 4, booktabs = TRUE) %>%
kable(digits = 4, booktabs = TRUE,
caption = "LOO comparison of Polynomial equations.") %>%
kable_styling(latex_options = "hold_position")
```

Expand Down Expand Up @@ -894,36 +909,6 @@ Much work is done before seeing the data or building a model. This includes talk

In this section we describe a simulation-based, principled workflow proposed by @betancourt2020 and broadly adopted by many members of the Bayesian community. The workflow broadly consists of specifying the likelihood and priors, performing prior predictive checks, fitting a model, and performing posterior predictive checks. The steps of the workflow are divided into three phases: 1) pre-model, pre-data, 2) post-model, pre-data, and 3) post-model, post-data. Tables \@ref(tab:ch030-Reborn-Space), \@ref(tab:ch030-Freaky-Sledgehammer), and \@ref(tab:ch030-Bleeding-Liquid-Dagger) list the steps of each phase.

<!-- Table \@ref(tab:ch030-workflow-steps) lists the detailed steps broken up into three phases.
```{r ch030-workflow-steps}
bind_rows(
data.frame(Phase = "Pre-Model, Pre-Data",
Step = c("conceptual analysis",
"define observational space",
"construct summary statistics")),
data.frame(Phase = "Post-Model, Pre-Data",
Step = c("develop model",
"construct summary functions",
"simulate Bayesian ensemble",
"prior checks",
"configure algorithm",
"fit simulated ensemble",
"algorithmic calibration",
"inferential calibration")),
data.frame(Phase = "Post-Model, Post-Data",
Step = c("fit observed data",
"diagnose posterior fit",
"posterior retrodictive checks",
"celebrate"))
) %>%
kable(digits = 2,
caption = "Principled workflow", booktabs = TRUE) %>%
kable_styling(latex_options = c("hold_position")) %>%
collapse_rows(columns = 1, valign = "top")
```
-->


```{r ch030-Reborn-Space}
data.frame(Step = c("Conceptual Analysis",
Expand Down Expand Up @@ -955,7 +940,7 @@ data.frame(Step = c("Develop Model",
"Check that the prior predictive distribution is consistent with domain expertise using the summary functions developed in the previous step.",
"Having simulated data, the next step is to fit the data generating model to the generated data. There are many different MCMC samplers with their own configurable parameters, so here is where those settings are tweaked.",
"Fit the simulated data to the model using the algorithm configured in the previous step.",
"How well did the algorithm do in fitting the simulated data? This step helps to answer the question regarding computational faithfulness. A model may be well specified, but if the algorithm used is unreliable then the posterior distribution is also unreliable, and this can lead to poor inferences. Methods for checking models is discussed in (#model-checking).",
"How well did the algorithm do in fitting the simulated data? This step helps to answer the question regarding computational faithfulness. A model may be well specified, but if the algorithm used is unreliable then the posterior distribution is also unreliable, and this can lead to poor inferences.",
"Are there any pathological behaviors in the model such as overfitting or non-identifiability? This step helps to answer the question of inferential adequacy."
)) %>%
kable(booktabs=TRUE, caption="Post-Model, Pre-Data steps.") %>%
Expand Down
Loading

0 comments on commit 24704c7

Please # to comment.