author: Sixiang Hu date: 30th Nov, 2014 width: 1280 height: 1024 autosize: true
- The R statistical programming language is a free open source package based on - the S language developed by Bell Labs.
- Many statistical functions are already built in. Contributed packages expand the functionality to cutting edge research.
- Free
-
Download R 3.1.2 (Pumpkin Helmet):
-
Download RStudio (Optional):
- http://www.rstudio.com/products/rstudio/download/
- To get a beautiful user interface
- Easier maintains
- Arithmetic Operations:
1+2*exp(3)/sin(4)
[1] -52.08
- Assignment:
a <- runif(4)
a
[1] 0.1926935 0.1130957 0.7909896 0.3856126
- Use help in R:
?sin
There are several types of variables can be used in R: integer, numerical, character, date, vector, matrix, list, and data frame.
a <- sample(LETTERS,5,replace=T)
a
[1] "P" "C" "E" "Y" "L"
class(a)
[1] "character"
class(data.frame(c(a=a,b=a)))
[1] "data.frame"
Data frame is the widest used data type used in R. It can have different type of value in different columns, but values in a column must keep the same.
a <- c(1,2,3)
b <- c("a","b","c")
df <- data.frame(a=a,b=b)
df
a b
1 1 a
2 2 b
3 3 c
df$a
[1] 1 2 3
df[2,2]
[1] b
Levels: a b c
List can be used to store different types of variables with differnet length.
a <- c(1,2,3)
b <- c("a","b","c")
lt <- list(a,b)
lt
[[1]]
[1] 1 2 3
[[2]]
[1] "a" "b" "c"
lt[[2]]
[1] "a" "b" "c"
We can create functions that can be used repetitely:
my_square <- function(x) {x*x}
my_square(5)
[1] 25
par(mfrow=c(2,2))
plot(cars,type="l")
hist(cars$speed)
plot(cars)
hist(cars$dist)
model1 <- glm(mpg~gear+cyl,data=mtcars,family=Gamma)
summary(model1)
Call:
glm(formula = mpg ~ gear + cyl, family = Gamma, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.34679 -0.08718 -0.01579 0.08210 0.25549
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0186841 0.0115534 1.617 0.117
gear -0.0020063 0.0021761 -0.922 0.364
cyl 0.0067352 0.0008912 7.558 2.48e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.02356935)
Null deviance: 2.73529 on 31 degrees of freedom
Residual deviance: 0.70551 on 29 degrees of freedom
AIC: 166.14
Number of Fisher Scoring iterations: 4
We can use different family and link function for GLM modeling. Interactions, offset, splines and transformation are all can be done within R.
For some large dataset (>100MB), it will be slow for a fit.
Some packages bigGLM
can be used to accelarate to process.
library(ggplot2)
library(ggthemes)
ggplot(data = msleep, aes(x = log(bodywt), y = sleep_total)) +
geom_point(aes(shape=vore), size=3)+
theme_wsj()
-
R Tutorial (Statistics)
-
CRAN R Tutorial
-
Data Camp R
-
Kaggle