- pres: contains all Rmarkdown and knitted results. Contents below:
- ml2
- Contains an overview of caret, knn, and naive bayes
- ml1
- contains overcomplicated knn code, good for an excercise in overly fancy R code and not much else
- html-scraping
- Contains a primer on scraping websites with
rvest
, as well as a slight introduction to pipes, and a few user created functions. Read this then scraping.R as a followup with more code to play with.
- Contains a primer on scraping websites with
- copyonmodify
- Discusses how we should avoid for loops and "growing vectors" in general, due to some of the fun little quirks of R.
- ml2
- R: contains all R code. Contents below:
- eda1/R
- Feature elimination tricks in R!!
- final.R
- R example of automated EDA, some training of models, from our final office hours :(
- logo.R
- R code used to make the logo for this repository
- scraping.R
- Example html scraping code
- tidy1.R
- a quick primer on dplyr
- count-and-pipes.R
- Counting with dplyr and piping with magrittr
- applied.R
- Contains the first really advanced stuff that we will do in here, apply/lapply review, which is the equivalent mathematically of mapping, anonymous functions, lists of functions, "function factories" (closures), and finally brings everything together in one crazy example. Will be made into a .Rmd soon enough
- lm_1.R
- Contains the basics of linear modeling
- ml1.R
- Contains overcomplicated knn
- json2gif
- Contains simple sample code where a JSON is used to show the movement of bodies through time
- eda1/R
- src: contains C/C++ code that is used to speed up R. Currently this is empty, and we may not use this directory. Interested parties can make an issue request, email me, or message me on slack and we will work on this. For now, it is enough to know this is part of the structure of a big R project.
- fig: contains images and figures generated
- data: contains minimal data. It is best to save data here not in csv format, but as RData/rda, because it is much much lighter.
This repository is a glimpse of what a well structured R project looks like. In general, we put R code in the R directory, the pretty output in its own directory, images in a directory, and low level code in a src directory. If you intend on developing an R package, which i would be happy to discuss, a good reading is Hadley's "package structure". This is also just useful information to use on your own R projects. I will provide (opinionated) thoughts on workflow, project structure, etc. later on.
Simple solution: don't put install.packages in Rmarkdown files.
More complex solution: install.packages(packagename,repos = "http://cran.us.r-project.org")
First attempt at answering: setwd() does not work in knitr. Instead, in the R setup chunk, do knitr::opts_knit$set(root.dir = '/path/to/root/dir/of/project')
, or set the root directory with the R studio GUI
A simpler, but far less reproducible attempt is to just use the absolute path. But in general, it is better to use relative paths, so see above solution. Setting the root project dir tells knitr to execute your R code in a session where the working directory is what you specified. Then all your paths should work.
A final solution, is lets say you have a Rmarkdown file in pres, and a data file in data. Then, we can in the rmd file, say:
df<-load('../data/myfile.RData')
- caret documentation
- ml metrics
- naive bayes overview
- naive bayes math/fast naive bayes
- awesome-msds
- a MSDS student's repository containing awesome resources for the program
- awesome-r
- awesome R packages
- rmarkdown manual
- an amazing resource for knitting
- knitr options
- More knitting resources
- why should I use functions
- Functions not only make your code more readable, but they can also make repitive tasks easier. In my opinion, we should write many small functions and combine them in a bigger function. This makes our code more readable, and more overall useful. See below for an example:
# let us say we want to be able to take the log of any number, and if it is negative
# we want to make it the absolute value. This is not directly useful, but in math
# it pops up a lot (see differential equations)
square <- function(x){
x*x
}
# yes there is an absolute value function, abs(), but this is for demonstration purpuses
absval <- function(x){
sqrt(square(x))
}
# We are including `...` because the log() function can take extra arguments, e.g.
# base. We want to be able to have those be allowed in our function too.
abslog <- function(x,...){
log(absval(x),...)
}
abslog(-2)
# [1] 0.6931472
abslog(2)
# [1] 0.6931472
# Now lets see the ...
abslog(-10, base = 10)
# [1] 1
abslog(3432, base = 2)
# [1] 11.74483
abslog(-3432, base = 2)
# [1] 11.74483
- more to come
- even more