-
Notifications
You must be signed in to change notification settings - Fork 0
Reproducibility in R
In this workshop we worked on setting up Git Bash in RStudio and learning to navigate the command line in the terminal within R. One of the outcomes from this workshop was learning how to navigate different files and determining which folders on our own local computer we might use the most often. I assume that my most frequent folders would be:
- c/users/erika/Documents/R for learning R commands and for documenting scripts
- c/users/erika/Documents/Work for data and files related to work tasks and projects
- c/users/erika/Documents/Dissertation Work for data and files related to my thesis work
Shell commands are really helpful for learning how to navigate folders in R. Here are some important commands I used:
- pwd: command to print your current working directory
- cd: change directory/folder
- ls: list the files and folders in your current directory
- Commands have optional arguments that you can add to the command with a dash
- Find manual page for commands by doing [command] --help or man [command] (exit man by hitting q)
- Download shell-data folder and move to your Desktop folder
- ls [folder] will list contents of a subfolder from your current location
- Unzip shell-data folder using cd Desktop then unzip shell-lesson-data.zip
- File and folder names should have no spaces in them; use dashes or underscores instead
- Can use cd multiple times to move further into subfolders or put together a path like cd Desktop/project
- Absolute paths start with a / and work from anywhere on your computer; relative paths depend on where you are currently located
- An absolute path is like the address for a building, while relative paths are like directions from where you’re at to another location
- Move up one folder level using cd ..
- . is shorthand for your current folder and .. is shorthand for the folder above your current folder
For secure data, you can store that in a local machine that all the users can access (or BoxHealth), but you can use Git and GitHub for storage of code, and then change the file path. In general, GitHub is not a data repository because there are size limits for public repositories (private is unlimited). In git, there are three parts: saving the data, staging the information to a staging area, and then committing the change to a repository
Here are some helpful commands I learned and notes for this session
- When you start git for the first time, you need to configure your git username and email (only need to do this once per computer): git config --global user.name "[name]", git config --global user.email "[GitHub email]"
- Check current git config settings with git config --list
- One main folder per project, which we’ll turn into a git repository now and later an RStudio project
- Create new folder with mkdir [directory name]
- From within a project folder, turn it into a git repo using git init
- Hidden files have names that begin with a period, can see hidden files using ls -a
- Can see info about your git repo using git status
- History of moving from master to main terminology: https://www.jumpingrivers.com/blog/git-moving-master-to-main/
- Move files to the stage with git add [file name]
- Commit the staged files with git commit -m "[message]"
- Each commit represents a chunk of work, or a new version of your files
- Create a new R script, save it (Jessica recommends using a numbered system for file naming!), and do git status to see what has changed in repo
- git add 01-clean-soil-data.R and git commit -m "Initialize cleaning script"
- Commit messages usually start with a present tense verb
- You can add multiple files with git add to include multiple files in a particular commit
- Can get list of commits using git log
- Add a line to the R script, then do add-commit cycle again to create second commit