```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Use this document to follow along with the working session on writing functions, running provided code to see how it works as well as writing your own practice code. I recommend turning on the visual editor via the ![](https://rstudio.github.io/visual-markdown-editing/images/visual_mode_2x.png){width="3%"} button in the right-hand corner.
Make sure you are working in a clean R session (no other packages loaded).
You can find today's slides [here](https://aosmith16.github.io/spring-r-topics/slides/week06_writing_functions.html#1). The slides contain introductory material as well as a list of resources about functions in R.
## Structure of a function
We create functions with `function()`. This involves:
1. Assigning a name for the new function. Use your preferred assignment operator.
2. Allowing for inputs to the function as *arguments* within `function()`. The inputs are values or objects we will use within the function.
3. Writing code to create the function output. This code goes between the curly braces (`{`) and uses the provided inputs to return a single output.
A skeleton of a function looks like this.
```{r, eval = FALSE}
# Don't run this code chunk
my_function = function(argument1, etc.) {
return(output)
}
```
Once we have run the code to create a function, we use it like other functions in R, passing inputs to each argument.
```{r, eval = FALSE}
# Don't run this code chunk
my_function(argument1 = input1, etc.)
```
## Basic function
Let's start with a very basic function.
This function, named `return_input`, will have a single argument, `input`. This function will return whatever we give as input as output.
Run this chunk to create the function.
```{r}
return_input = function(input) {
return(input)
}
```
The goal here is to see how we give input to the `input` argument and the function returns an output based on that input.
For example, we could give a single, numeric value to `input`.
```{r}
return_input(input = 1)
```
### Your turn
Write code in the empty code chunk below.
I want you to spend a few minutes passing different values and objects to the `input` argument of `return_input()` and looking at the output.
Pass the following to the `input` argument:
- The letter `"a"`
- A series of letters using `letters[1:5]`
- A vector of numbers like `c(1, 5, 9, 10)`
- The data.frame `mtcars`
```{r}
# Write your code in this chunk and run to see the output
```
## A function with two arguments
A function can have as many arguments as we want to make. We keep adding them in `function()` when we define the function.
Like with function naming, keeping your argument names descriptive of what kind of input it takes is useful.
Our next function has two arguments and outputs the sum of these two inputs.
```{r}
sum_two = function(num1, num2) {
sum = num1 + num2
return(sum)
}
```
Let's pass single values to the two arguments in `sum_two`.
```{r}
sum_two(num1 = 1,
num2 = 2)
```
### Your turn 1
Write code in the empty code chunk below.
First I want you to see what happens when using a couple of different types of input to the two arguments in `sum_two`.
Pass the following to `num1`, and `num2`:
- A vector of numbers, `c(1, 5, 9, 10)` and `c(2, 3, 4, 5)`
- A string, i.e., `"a"` and `"b"`
```{r, error = TRUE}
# Write your code in this chunk and run to see the output
```
### Your turn 2
Write code in the empty code chunk below.
Now practice writing your own function called `norm_by_y`.
- The function should have two inputs. Name the arguments whatever you'd like.
- The function output will be the difference between the inputs divided by the second input. An equation for this looks like: (x - y)/y
```{r}
# Write your code in this chunk and run to see the output
```
## Remove explicit `return()`
I wanted to start out using `return()` in the function because I think it makes what is returned clearer. This is useful when you are getting started thinking about inputs and outputs.
In R it is not standard to use this coding style. Printing the output object will *return* the output.
The following functions, without the explicit use of `return()`, do the same thing as the ones we defined previously. This function code is more like what you'll see in other R functions.
```{r}
return_input = function(input) {
input
}
return_input(input = 1)
```
```{r}
sum_two = function(num1, num2) {
sum = num1 + num2
sum
}
sum_two(num1 = 1,
num2 = 2)
```
## Explore existing function
Learning how to write functions helps you understand existing functions. Sometimes when we are having trouble with a function, taking a look at the underlying *source code* will help us figure out what the function is doing and where our problem is coming from.
You can see the code within a function by running the function name without parentheses in R. Here's the code within `union()`. Don't worry about the last two printed lines when you do this.
```{r}
union
```
**Note:** See `getAnywhere()` for printing function code for certain methods of generic functions, such as `plot.lm`.
You can use `View()` to open the source code of a function in a new window in your `Source` pane. This can be easier to look through than printing in the Console.
The `edit()` function is another useful option, and works better than `View()` when you need `getAnywhere()`.
```{r}
View(union)
```
**Note:** Many base R functions are written in C++ and so do not return R code.
### Your turn
Take a few minutes and explore the code within the `union()` function.
First, use the provided vectors for `x` and `y` as inputs to `union()`.
Then, go through the `union()` source code one step at a time.
1. Use `as.vector()` on each object.
2. Concatenate the two objects with `c()`.
3. Use `unique()` on the vector from step 2.
```{r}
x = c(1, 2, 3)
y = c(2, 3, 4)
```
## Returning multiple outputs
A function in R can only return a single object, even if we attempt to return more.
Here let's making a function that sums two numbers and returns the sum as well as the two original numbers.
```{r}
return_all = function(num1, num2) {
sum = num1 + num2
sum
num1
num2
}
```
What happens when we try to use this? It returns only the very last thing we asked for.
```{r}
return_all(num1 = 1,
num2 = 2)
```
If we want to return multiple outputs we'll need to combine them into one.
A list is one option for this. This is actually a *named* list
```{r}
return_all2 = function(num1, num2) {
sum = num1 + num2
list(sum = sum,
num1 = num1,
num2 = num2)
}
return_all2(num1 = 1,
num2 = 2)
```
A data.frame as output is useful in some cases.
```{r}
return_all3 = function(num1, num2) {
sum = num1 + num2
data.frame(sum = sum,
num1 = num1,
num2 = num2)
}
return_all3(num1 = 1,
num2 = 2)
```
### Your turn
Write code in the empty code chunk below.
Go back to the function you made earlier called `norm_by_y()`.
- Add a 2 to the end of the function name to name a new function `norm_by_y2`.
- Edit the function so it returns both (x - y)/y and (x - y) in a list.
```{r}
# Write your code in this chunk and run to see the output
```
## Default arguments
We can assign default values to arguments. You'll see this, for example, with logical arguments (i.e., arguments that take `TRUE`/`FALSE`).
If we want the default when using the function we don't have to write the argument out.
For example, `na.rm = FALSE` is the default in `mean()`. Let's make a function built around `mean()` where the default is `TRUE` instead, so NA values are automatically stripped.
This function has two arguments, `vector`, and `remove.na`. `remove.na` has a default value of `TRUE`. Note this input is passed to the `na.rm` argument within `mean()` inside the function we are making.
```{r}
mean_nona = function(vector, remove.na = TRUE) {
mean(vector, na.rm = remove.na)
}
```
Here's how this works if the vector contains `NA` by default. We don't need to write out `remove.na`.
```{r}
mean_nona(vector = c(1, 2, NA))
```
If we wanted to change the default, write the argument out.
```{r}
mean_nona(vector = c(1, 2, NA),
remove.na = FALSE)
```
## Arguments names vs positions
This feels like a good time to talk about writing out argument names vs using argument positions. Both are allowed and work.
Using argument positions means we pass inputs to arguments in the order they appear in the function.
However, using argument positions tends to lead to unclear code. Here we pass a vector to the first argument in `mean()` and 0.1 to the second argument.
What is the second argument in `mean()`?
```{r}
mean(c(1:10, 50), 0.1)
```
Writing out argument names in this case makes the code a little more understandable.
```{r}
mean(x = c(1:10, 50),
trim = 0.1)
```
While I most often recommend writing out argument names, there are cases that we may not need to, especially for generally well-known functions.
For example, for model fitting functions like `lm()` in R, the first argument is always the model formula. Since this is so well known it is fairly standard to leave the `formula` argument name off.
```{r}
lm(mpg ~ disp, data = mtcars)
```
Packages like **dplyr** use the dataset as the first argument as a standard, and you will rarely see the first argument name written out.
You'll sometimes see code from advanced users where they've stopped writing out any argument names. This can lead to confusion for newer users copying code and can lead to mistakes.
For example, if I wanted to fit a weighted regression, I would leave off `formula` but then use the other argument names for the dataset and weights.
```{r}
lm(mpg ~ disp,
data = mtcars,
weights = cyl)
```
What would happen if I left these argument names off?
```{r}
lm(mpg ~ disp,
mtcars,
cyl)
```
The third argument to `lm()` is actually `subset`, not `weights`. Luckily the `call` in the output shows the mistake, but that can be easy to miss.
I've seen people do something very like this and not catch it in a real analysis.
## Dot-dot-dot to pass additional arguments
Including `...` within `function()` is a way to pass additional arguments into a function. This is most common for passing arguments to existing functions used inside the one you are making.
For example, let's say we were again using `mean()` inside our function to calculate a mean.
```{r}
use_mean = function(vector) {
mean(vector)
}
```
If we use a vector that contains `NA`, as you know, the `mean()` function will return `NA`.
```{r}
use_mean(vector = c(1, 2, NA))
```
We want the option to pass in `na.rm = TRUE` to the function. Since that already is an argument available in `mean()`, we can use `...` for this.
Notice that we literally add `...` into the arguments of the function and then use it again within `mean()`, since this is where we want to be able to input the additional arguments.
```{r}
use_mean2 = function(vector, ...) {
mean(vector, ...)
}
```
Now we can pass arguments available in `mean()` into our function. Anything we put in after the `vector` argument is passed into `mean()`. We'll use `na.rm` here but could also use `trim`.
```{r}
use_mean2(vector = c(1, 2, NA),
na.rm = TRUE)
```
**Important note:** When using `...`, any misspelled arguments will not raise an error and so you may miss typos you've made.
Look what happens when I misspell `na.rm` as `na.rn`.
```{r}
use_mean2(vector = c(1, 2, NA),
na.rn = TRUE)
```
## Conditional returns
Back when we made `sum_two()`, I had you look at the error you got when you passed in characters when the function used numbers.
```{r, error = TRUE}
sum_two(num1 = "a",
num2 = "b")
```
That error message isn't particularly clear. Adding *conditions* in this function to return a useful message if the values aren't numeric is an example of what I've seen called conditional returns or *conditional execution*.
We add *conditions* via logical statements, usually involving `if()` and `else()` statements.
In the documentation for `if()` (see `` ?`if` ``) the **Usage** is:
> if(cond) expr
More commonly in functions we'll use `if()` and `else()` together:
> if(cond) cons.expr else alt.expr
You can interpret this as:
> Test some condition. If the condition is met, return one thing. If it's not met, return something else.
Here's a skeleton example of an if-else statement. Note the logical "condition" in the parentheses and the sets of curly braces (`{`).
```{r, eval = FALSE}
# Don't run this code chunk
if(condition) {
Do something
} else {
Alternative something
}
```
You will see people skip writing out the curly braces (`{`) in some simple cases but I recommend sticking with them for the clearest code. You can read one opinion on if-else coding style in [R for Data Science](https://r4ds.had.co.nz/functions.html#code-style).
OK, we're ready to see an example. This uses `is.character` to test if the input given to `num1` is a character.
```{r}
sum_two_if = function(num1, num2) {
if(is.character(num1)) {
"You must enter a number or vector of numbers for num1."
} else {
num1 + num2
}
}
```
Let's see what the result using a letter for `num1`.
```{r}
sum_two_if(num1 = "a",
num2 = "b")
```
With numbers:
```{r}
sum_two_if(num1 = 1,
num2 = 2)
```
We can expand the condition with an *or* statement via `|`. In this case we'd want to test if the input to *either* argument is a character.
```{r}
sum_two_if2 = function(num1, num2) {
if(is.character(num1) | is.character(num2)) {
"You must enter a number or vector of numbers for both num1 and num2."
} else {
num1 + num2
}
}
sum_two_if2(num1 = 1,
num2 = "b")
```
### Your turn 1
Write code in the code chunk below.
Starting with function `use_mean2`, included in the chunk:
1. Add in a condition to test if the `vector` argument is numeric using `is.numeric()`
2. If it is, return the mean of the vector
3. Otherwise return a message explaining what is wrong
```{r}
use_mean2 = function(vector, ...) {
mean(vector, ...)
}
```
### Your turn 2
Take a look at the code for the `as.factor()` function, which takes a vector of values for its `x` argument.
```{r}
View(as.factor)
```
This function contains multiple conditions use if-elseif-else, which we haven't seen yet. Note the function authors left out curly braces for simple conditions.
Try to summarize what each condition is and generally what is done at each step. You don't need to go into great detail.
I'll get you started for each of the three conditions:
1. First the function checks if...
2. Ignore the `!is.object(x)` part of the second condition. Second the function checks...
3. Finally, if the other two conditions aren't met the function...
## Doing more in a function
So far we've written very basic, one-step functions. Of course you can do many, many steps in a function to achieve your desired output.
Let's make one slightly longer function.
The `get_rsq()` functions:
- Takes a dataset as input
- Fits a model to that dataset
- Gets the model summary
- Pulls out and round the model $R^2$ value
- Returns the rounded $R^2$
```{r}
get_rsq = function(data) {
mod = lm(mpg ~ disp, data = data)
summ = summary(mod)
rsq = round(summ$r.squared, digits = 2)
rsq
}
```
You'll note the `formula` variables are fixed, which is a common set up if you are going to analyze many datasets with the same variables in them.
This function is set up to use with the variables in the `mtcars` dataset.
```{r}
get_rsq(data = mtcars)
```
We can pass a subset of the data to this function, for example.
```{r}
get_rsq(data = subset(mtcars, cyl == 4) )
```
You'd use this sort of set-up to get ready to *iterate* through many similar datasets, which is one important use of functions.
## More on functions
You may reach a point where you need to know more about function environments or testing. Or maybe you find yourself with have a series of functions that you use a lot and want to make them into a local R package.
If that's ever the case for you, [Stephanie Kirmer's talk](https://skirmer.github.io/presentations/functions_with_r.html#1) and exercises are a good place to start!