Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Ch. 8.2: Discrepancy in numbers #157

Open
ltobar21 opened this issue Jan 29, 2020 · 2 comments
Open

Ch. 8.2: Discrepancy in numbers #157

ltobar21 opened this issue Jan 29, 2020 · 2 comments

Comments

@ltobar21
Copy link

In the beginning of Ch. 8.2, the textbook talks about using the mode as the simplest model, and it goes on to say that using the mode as a model for the data,

"the average individual has a fairly large error of -28.8 centimeters."

However, at the end of the chapter, after suggesting the mean as a better model because it produces less error, the textbook states,

"The mean has a pretty substantial amount of error – any individual data point will be about 27 cm from the mean on average – but it’s still much better than the mode, which has an average error of about 39 cm."

To me, this inconsistency implies that using the mode as a model gives an error of both -28.8cm and 39cm. If these numbers are talking about different models, I would suggest clarifying this section to make that more obvious. If they are both describing the model using the mode, the discrepancy in average error should be amended.

@complexbrains
Copy link
Contributor

complexbrains commented Feb 5, 2020

Dear Itobar21, I know it sounds confusing but the difference between the numbers comes from how they are estimated.

The number -28.8 is the average of the differences between the mode (most frequent height within the group) and each individual's height. If you do the same estimation using the mean (get the average of the differences between the mean value and height) then you obtain 0. So mean seems a good model but since the values in the average estimation of the mean cancel each other, therefore the comparison of the goodness of mode vs mean as a model requires an additional test using "mean squared error" to eliminate cancellation of the values. So mean squared error estimation provides a better validation to see how good fit is the mean or mode as a model.

So if we use mode as a model, it shows a higher deviation from the samples (mean squared error of mode = 39) comparing to if we use mean as the model (mean squared error of mean = 27).

I hope I could manage to make it clearer for you. I would recommend you go to the file of that chapter (https://github.com/poldrack/psych10-book/blob/master/05-FittingModels.Rmd) and run code snippets given among the text in R to see where those numbers come from. Then I am sure everything will make more sense for you.

Thank you,
Isil

@poldrack
Copy link
Owner

poldrack commented Feb 6, 2020

I'll try at some point to change the wording to make this clearer....

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants