-
Notifications
You must be signed in to change notification settings - Fork 0
Reproducibility in R
In this workshop we worked on setting up Git Bash in RStudio and learning to navigate the command line in the terminal within R. One of the outcomes from this workshop was learning how to navigate different files and determining which folders on our own local computer we might use the most often. I assume that my most frequent folders would be:
- c/users/erika/Documents/R for learning R commands and for documenting scripts
- c/users/erika/Documents/Work for data and files related to work tasks and projects
- c/users/erika/Documents/Dissertation Work for data and files related to my thesis work
Shell commands are really helpful for learning how to navigate folders in R. Here are some important commands I used:
-
pwd
: command to print your current working directory -
cd
: change directory/folder -
ls
: list the files and folders in your current directory - Commands have optional arguments that you can add to the command with a dash
- Find manual page for commands by doing
[command] --help
orman [command]
(exit man by hitting q) - Download shell-data folder and move to your Desktop folder
-
ls [folder]
will list contents of a subfolder from your current location - Unzip shell-data folder using
cd Desktop
thenunzip shell-lesson-data.zip
- File and folder names should have no spaces in them; use dashes or underscores instead
- Can use
cd
multiple times to move further into subfolders or put together a path likecd Desktop/project
- Absolute paths start with a
/
and work from anywhere on your computer; relative paths depend on where you are currently located - An absolute path is like the address for a building, while relative paths are like directions from where you’re at to another location
- Move up one folder level using
cd ..
-
.
is shorthand for your current folder and..
is shorthand for the folder above your current folder
For secure data, you can store that in a local machine that all the users can access (or BoxHealth), but you can use Git and GitHub for storage of code, and then change the file path. In general, GitHub is not a data repository because there are size limits for public repositories (private is unlimited). In git, there are three parts: saving the data, staging the information to a staging area, and then committing the change to a repository
Here are some helpful commands I learned and notes for this session
- When you start git for the first time, you need to configure your git username and email (only need to do this once per computer):
git config --global user.name "[name]"
,git config --global user.email "[GitHub email]"
- Check current git config settings with
git config --list
- One main folder per project, which we’ll turn into a git repository now and later an RStudio project
- Create new folder with
mkdir [directory name]
- From within a project folder, turn it into a git repo using
git init
- Hidden files have names that begin with a period, can see hidden files using
ls -a
- Can see info about your git repo using
git status
- History of moving from master to main terminology: https://www.jumpingrivers.com/blog/git-moving-master-to-main/
- Move files to the stage with
git add [file name]
- Commit the staged files with
git commit -m "[message]"
- Each commit represents a chunk of work, or a new version of your files
- Create a new R script, save it (Jessica recommends using a numbered system for file naming!), and do
git status
to see what has changed in repo -
git add 01-clean-soil-data.R
andgit commit -m "Initialize cleaning script"
- Commit messages usually start with a present tense verb
- You can add multiple files with
git add
to include multiple files in a particular commit - Can get list of commits using
git log
- To go back to an original commit you can use
git log
and copy the number for the commit, then usegit diff [number]
to go back to a previous version
We learned the difference between Git (which we learned last time), and GitHub. They shared about the importance of GitHub again in order to share projects publicly, collaborate with others, displays commits and logs much better than the log in Git. We followed this tutorial from the SW Carpentry
Here are some helpful commands I learned and notes for this session:
- Moving files with
mv
, first argument is path to file to be moved, second argument is path to where it should be moved to - Create
.gitignore
as a text file, type paths/names of files you do NOT want git to track - Must be named
.gitignore
- Should be located in the main repository folder
- Can list files by name, entire directories, wildcard with file extensions (e.g., *.pdf)
- Exceptions to wildcard can be made with ! in front of a particular path/file name
- Create file with
touch
and the file name/extension that you want - Remove files with
rm
and file name - Security keys (ssh keys) on your computer, share public key to GitHub
- To check
.ssh/ folder, ls -al ~/.ssh
-
ssh-keygen -t ed25519 -C [youremailaddress@yourdomain.edu]
or can name something like "Jessica@UA_laptop" - Add or don't add password
cat ~/.ssh/id_ed25519.pub
- Copy and paste output (with right click on a Windows) to a new SSH key on GitHub; name something that identifies which computer/machine the private key is located on, e.g. "Jessica@UA_laptop"
-
git remote add origin git@github.com:username/reponame.git
to add the connection to the repo name *git remote -v
to check that your remotes are set up correctly -
git push origin main/master
to amend to GitHub repo