Adapted from Software Carpentry's Functions and Control Flow
Please make sure your directory structure is setup as described here
We have covered basic R usage:
- Reading data files
- Creating and manipulating variables
- Data types
- Calling built-in functions
Now we will cover answers to 3 specific questions. They are as follows.
questions:
- "How can I repeat several operations with a single command in R?"
- "How can I make data-dependent choices in R?"
- "How can I repeat operations for different datasets in R?"
- "Write user-defined functions with
function()
" - "Write conditional statements with
if()
andelse
." - "Write and understand
for()
loops."
- "Use
function()
to automate specific tasks." - "Use
if
andelse
to make choices." - "Use
for
to repeat operations."
- Open RStudio and set Lesson4_ProgrammingR as your working directory.
- Copy the gapminder dataset from
Data
folder. - Clear your history in R enviroment removing all the variables that your created previously.
- Open a new R script and read the file as
gapminder <- read.table("gapminder.txt", header = TRUE)
Functions are a list of operations/commands that automate something complicated or convenient or both. A function usually gets one or more inputs called arguments. Functions often (but not always) return a value. An example of a function would be the sqrt()
function. The input (the argument) must be a number, and the output (the return value) is the square root of the input number. Executing a function (‘running it’) is called calling the function
. An example of a function call is:
b <- sqrt(a)
Functions provide:
- a name we can remember and invoke it by
- relief from the need to remember the individual operations
- a defined set of inputs and expected outputs
- rich connections to the larger programming environment
Let’s create a new script, call it functions_lesson.R
, and write some examples of our own. Let’s define a function fahrenheit_to_celsius
that converts temperatures from Fahrenheit to Kelvin:
Input
fahrenheit_to_celsius <- function(temp) {
celsius <- ((temp - 32) * (5/9))
return(celsius)
}
We define fahrenheit_to_celsius
by assigning it to the output of function
. The list of argument names are contained within parentheses. Next, the body of the function–the statements that are executed when it runs–is contained within curly braces ({}
). The statements in the body are indented by two spaces, which makes the code easier to read but does not affect how the code operates.
When we call the function, the values we pass to it are assigned to those variables so that we can use them inside the function. The last line within the function is what R will evaluate as a returning value. For example, let’s try running our function. Calling our own function is no different from calling any other function:
Input
fahrenheit_to_celsius(32)
Output
## [1] 0
Input
fahrenheit_to_celsius(212)
Output
## [1] 100
We’ve successfully called the function that we defined, and we have access to the value that we returned.
Composing Functions
Now that we’ve seen how to turn Fahrenheit into Celsius, it’s easy to turn Celsius to Kelvin:
Input
celsius_to_kelvin <- function(temp_C) {
temp_K <- temp_C + 273.15
return(temp_K)
}
Input
# Freezing Point of water
celsius_to_kelvin(0)
Output
## [1] 273.15"
What about converting Fahrenheit to Kelvin? We could write out the formula, but we don’t need to. Instead, we can compose the two functions we have already created:
Input
fahrenheit_to_kelvin <- function(temp_F) {
temp_C <- fahrenheit_to_celsius(temp_F)
temp_K <- celsius_to_kelvin(temp_C)
return(temp_K)
}
Input
# Freezing point of water in Kelvin
fahrenheit_to_kelvin(32)
Output
## [1] 273.15
This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-large chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here—typically half a dozen to a few dozen lines—but they shouldn’t ever be much longer than that, or the next person who reads it won’t be able to understand what’s going on.
Create a function MeanlifeExp()
, that takes a continent as its argument and returns the mean life expectancy of that continent. For example MeanlifeExp("Europe")
returns 71.90369
Example function call and output:
MeanLifeExp("Europe")
[1] 71.90369
Solution
Input
MeanLifeExp <- function(Continent) {
Subset_Continent_LifeExp <- gapminder[gapminder$continent == Continent, "lifeExp"]
lifeExp <- mean(Subset_Continent_LifeExp)
return(lifeExp)
}
MeanLifeExp("Europe")
Output
[1] 71.90369
Let's say we have the following problem:
In the gapminder dataset, which continents have the mean life expectancy smaller or larger than 50 years?
How would we approach such a problem? What are the steps that are needed? The concepts in the next sections will help us breakdown this problem into smaller bits and eventually solve it. I want to you think about the concept we learn and how that may be applicable to this problem as we move along the lesson.
Often when we're coding we want to control the flow of our actions. This can be done by setting actions to occur only if a condition or a set of conditions are met. Alternatively, we can also set an action to occur a particular number of times.
There are several ways you can control flow in R. For conditional statements, the most commonly used approaches are the constructs:
# if
if (condition is true) {
perform action
}
# if ... else
if (condition is true) {
perform action
} else { # that is, if the condition is false,
perform alternative action
}
Say, for example, that we want R to print a message if a variable x
has a particular value:
Input
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
}
x
Output
8
The print statement does not appear in the console because x is not greater than 10. To print a different message for numbers less than 10, we can add an else
statement.
Input
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
} else {
print("x is less than 10")
}
Output
[1] x is less than 10
You can also test multiple conditions by using else if
.
Input
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
} else if (x > 5) {
print("x is greater than 5, but less than 10")
} else {
print("x is less than 5")
}
Output
[1] x is greater than 5, but less than 10
Important: when R evaluates the condition inside if()
statements, it is looking for a logical element, i.e., TRUE
or FALSE
. This can cause some headaches for beginners. For example:
Input
x <- 4 == 3
if (x) {
"4 equals 3"
} else {
"4 does not equal 3"
}
Output
[1] "4 does not equal 3"
As we can see, the not equal message was printed because the vector x
is FALSE
.
Input
x <- 4 == 3
x
Output
[1] FALSE
Calculate the mean life Expectancy (using the previous user-defined function) of Asia. If the life expectancy is greater than or equal to 50, print("Life Expectancy of Asia is greater than or equal to 50")
, if not print("Life Expectancy of Asia is lower than 50")
Solution
Input
Asia_lifeExp <- MeanLifeExp("Asia")
if(Asia_lifeExp >= 50){
print("Life Expectancy of Asia is greater than or equal to 50")
} else {
print("Life Expectancy of Asia is lower than 50")
}
Output
[1] "Life Expectancy of Asia is greater than or equal to 50"
Do you think we can apply
if
andelse
to our problem?
If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a for()
loop will do the job. We saw for()
loops in the shell lessons earlier. This is the most flexible of looping operations, but therefore also the hardest to use correctly. Avoid using for()
loops unless the order of iteration is important: i.e. the calculation at each iteration depends on the results of previous iterations.
The basic structure of a for()
loop is:
for(iterator in set of values){
do a thing
}
Input
for(i in 1:10){
print(i)
}
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
The 1:10 bit creates a vector on the fly; you can iterate over any other vector as well.
For example, in our previous R lesson we had a myorder_df
. We could iterate over each menuItem
and display its cost.
Input
menuItems<-c("chicken", "soup", "salad", "tea")
menuType<-factor(c("solid", "liquid", "solid", "liquid"))
menuCost<-c(4.99, 2.99, 3.29, 1.89)
myorder_df<-data.frame(menuItems, menuType, menuCost)
for (items in myorder_df$menuItems){
myorder_df_subset <- myorder_df[myorder_df$menuItems == items,]
print(items)
print(myorder_df_subset$menuCost)
}
Output
[1] "chicken"
[1] 4.99
[1] "soup"
[1] 2.99
[1] "salad"
[1] 3.29
[1] "tea"
[1] 1.89
Now, you are equipped with all things necessary to solve our problem.
Write a script that loops through the gapminder
data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.
Hint: Try unique() function to get unique values of a column
Solution
gapminder <- read.table("gapminder.txt", header = TRUE)
thresholdValue <- 50
continent_list <- unique(gapminder$continent)
for(continent in continent_list){
continent_subset <- gapminder[gapminder$continent == continent, "lifeExp"]
tmp <- mean(continent_subset)
if(tmp <= thresholdValue){
print(paste0("Average Life Expectancy in ", continent, " is less than ", thresholdValue))
}
else{
print(paste0("Average Life Expectancy in ", continent, " is greater than ", thresholdValue))
} # end if else condition
rm(tmp)
}
- Define a function using
function()
. - The body of a function should be indented.
- Call a function using
name_of_the_function()
. - Return the result using the
return
statement. - Use
if
andelse
to make choices. - Use
for
to repeat operations.