parent
72e0f519dc
commit
843df1d22d
|
@ -22,14 +22,14 @@ One tool for reducing duplication is functions, which reduce duplication by iden
|
|||
Another tool for reducing duplication is **iteration**, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets.
|
||||
|
||||
In this chapter you'll learn about two important iteration paradigms: **imperative** and **functional**.
|
||||
On the imperative side you have tools like for loops and while loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening.
|
||||
However, for loops are quite verbose because they require bookkeeping code that is duplicated for every for loop.
|
||||
Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function.
|
||||
On the imperative side you have tools like `for` loops and `while` loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening.
|
||||
However, `for` loops are quite verbose because they require bookkeeping code that is duplicated for every `for` loop.
|
||||
Functional programming (FP) offers tools to extract out this duplicated code, so each common `for` loop pattern gets its own function.
|
||||
Once you master the vocabulary of FP, you can solve many common iteration problems with less code, more ease, and fewer errors.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Once you've mastered the for loops provided by base R, you'll learn some of the powerful programming tools provided by purrr, one of the tidyverse core packages.
|
||||
Once you've mastered the `for` loops provided by base R, you'll learn some of the powerful programming tools provided by purrr, one of the tidyverse core packages.
|
||||
|
||||
```{r}
|
||||
#| label: setup
|
||||
|
@ -62,7 +62,7 @@ median(df$d)
|
|||
```
|
||||
|
||||
But that breaks our rule of thumb: never copy and paste more than twice.
|
||||
Instead, we could use a for loop:
|
||||
Instead, we could use a `for` loop:
|
||||
|
||||
```{r}
|
||||
output <- vector("double", ncol(df)) # 1. output
|
||||
|
@ -72,17 +72,17 @@ for (i in seq_along(df)) { # 2. sequence
|
|||
output
|
||||
```
|
||||
|
||||
Every for loop has three components:
|
||||
Every `for` loop has three components:
|
||||
|
||||
1. The **output**: `output <- vector("double", length(x))`.
|
||||
Before you start the loop, you must always allocate sufficient space for the output.
|
||||
This is very important for efficiency: if you grow the for loop at each iteration using `c()` (for example), your for loop will be very slow.
|
||||
This is very important for efficiency: if you grow the `for` loop at each iteration using `c()` (for example), your `for` loop will be very slow.
|
||||
|
||||
A general way of creating an empty vector of given length is the `vector()` function.
|
||||
It has two arguments: the type of the vector ("logical", "integer", "double", "character", etc) and the length of the vector.
|
||||
|
||||
2. The **sequence**: `i in seq_along(df)`.
|
||||
This determines what to loop over: each run of the for loop will assign `i` to a different value from `seq_along(df)`.
|
||||
This determines what to loop over: each run of the `for` loop will assign `i` to a different value from `seq_along(df)`.
|
||||
It's useful to think of `i` as a pronoun, like "it".
|
||||
|
||||
You might not have seen `seq_along()` before.
|
||||
|
@ -102,13 +102,13 @@ Every for loop has three components:
|
|||
It's run repeatedly, each time with a different value for `i`.
|
||||
The first iteration will run `output[[1]] <- median(df[[1]])`, the second will run `output[[2]] <- median(df[[2]])`, and so on.
|
||||
|
||||
That's all there is to the for loop!
|
||||
Now is a good time to practice creating some basic (and not so basic) for loops using the exercises below.
|
||||
Then we'll move on to some variations of the for loop that help you solve other problems that will crop up in practice.
|
||||
That's all there is to the `for` loop!
|
||||
Now is a good time to practice creating some basic (and not so basic) `for` loops using the exercises below.
|
||||
Then we'll move on to some variations of the `for` loop that help you solve other problems that will crop up in practice.
|
||||
|
||||
### Exercises
|
||||
|
||||
1. Write for loops to:
|
||||
1. Write `for` loops to:
|
||||
|
||||
a. Compute the mean of every column in `mtcars`.
|
||||
b. Determine the type of each column in `nycflights13::flights`.
|
||||
|
@ -117,7 +117,7 @@ Then we'll move on to some variations of the for loop that help you solve other
|
|||
|
||||
Think about the output, sequence, and body **before** you start writing the loop.
|
||||
|
||||
2. Eliminate the for loop in each of the following examples by taking advantage of an existing function that works with vectors:
|
||||
2. Eliminate the `for` loop in each of the following examples by taking advantage of an existing function that works with vectors:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
|
@ -142,13 +142,13 @@ Then we'll move on to some variations of the for loop that help you solve other
|
|||
}
|
||||
```
|
||||
|
||||
3. Combine your function writing and for loop skills:
|
||||
3. Combine your function writing and `for` loop skills:
|
||||
|
||||
a. Write a for loop that `prints()` the lyrics to the children's song "Alice the camel".
|
||||
a. Write a `for` loop that `prints()` the lyrics to the children's song "Alice the camel".
|
||||
b. Convert the nursery rhyme "ten in the bed" to a function. Generalise it to any number of people in any sleeping structure.
|
||||
c. Convert the song "99 bottles of beer on the wall" to a function. Generalise to any number of any vessel containing any liquid on any surface.
|
||||
|
||||
4. It's common to see for loops that don't preallocate the output and instead increase the length of a vector at each step:
|
||||
4. It's common to see `for` loops that don't preallocate the output and instead increase the length of a vector at each step:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
|
@ -165,10 +165,10 @@ Then we'll move on to some variations of the for loop that help you solve other
|
|||
|
||||
## For loop variations
|
||||
|
||||
Once you have the basic for loop under your belt, there are some variations that you should be aware of.
|
||||
Once you have the basic `for` loop under your belt, there are some variations that you should be aware of.
|
||||
These variations are important regardless of how you do iteration, so don't forget about them once you've mastered the FP techniques you'll learn about in the next section.
|
||||
|
||||
There are four variations on the basic theme of the for loop:
|
||||
There are four variations on the basic theme of the `for` loop:
|
||||
|
||||
1. Modifying an existing object, instead of creating a new object.
|
||||
2. Looping over names or values, instead of indices.
|
||||
|
@ -177,7 +177,7 @@ There are four variations on the basic theme of the for loop:
|
|||
|
||||
### Modifying an existing object
|
||||
|
||||
Sometimes you want to use a for loop to modify an existing object.
|
||||
Sometimes you want to use a `for` loop to modify an existing object.
|
||||
For example, remember our challenge from [Chapter -@sec-functions] on functions.
|
||||
We wanted to rescale every column in a data frame:
|
||||
|
||||
|
@ -199,7 +199,7 @@ df$c <- rescale01(df$c)
|
|||
df$d <- rescale01(df$d)
|
||||
```
|
||||
|
||||
To solve this with a for loop we again think about the three components:
|
||||
To solve this with a `for` loop we again think about the three components:
|
||||
|
||||
1. **Output**: we already have the output --- it's the same as the input!
|
||||
|
||||
|
@ -216,7 +216,7 @@ for (i in seq_along(df)) {
|
|||
```
|
||||
|
||||
Typically you'll be modifying a list or data frame with this sort of loop, so remember to use `[[`, not `[`.
|
||||
You might have spotted that we used `[[` in all my for loops: we think it's better to use `[[` even for atomic vectors because it makes it clear that you want to work with a single element.
|
||||
You might have spotted that we used `[[` in all my `for` loops: we think it's better to use `[[` even for atomic vectors because it makes it clear that you want to work with a single element.
|
||||
|
||||
### Looping patterns
|
||||
|
||||
|
@ -300,9 +300,9 @@ Whenever you see it, switch to a more complex result object, and then combine in
|
|||
Sometimes you don't even know how long the input sequence should run for.
|
||||
This is common when doing simulations.
|
||||
For example, you might want to loop until you get three heads in a row.
|
||||
You can't do that sort of iteration with the for loop.
|
||||
Instead, you can use a while loop.
|
||||
A while loop is simpler than a for loop because it only has two components, a condition and a body:
|
||||
You can't do that sort of iteration with the `for` loop.
|
||||
Instead, you can use a `while` loop.
|
||||
A `while` loop is simpler than a `for` loop because it only has two components, a condition and a body:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
|
@ -312,7 +312,7 @@ while (condition) {
|
|||
}
|
||||
```
|
||||
|
||||
A while loop is also more general than a for loop, because you can rewrite any for loop as a while loop, but you can't rewrite every while loop as a for loop:
|
||||
A `while` loop is also more general than a `for` loop, because you can rewrite any `for` loop as a `while` loop, but you can't rewrite every `while` loop as a `for` loop:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
|
@ -329,7 +329,7 @@ while (i <= length(x)) {
|
|||
}
|
||||
```
|
||||
|
||||
Here's how we could use a while loop to find how many tries it takes to get three heads in a row:
|
||||
Here's how we could use a `while` loop to find how many tries it takes to get three heads in a row:
|
||||
|
||||
```{r}
|
||||
flip <- function() sample(c("T", "H"), 1)
|
||||
|
@ -348,7 +348,7 @@ while (nheads < 3) {
|
|||
flips
|
||||
```
|
||||
|
||||
I mention while loops only briefly, because we hardly ever use them.
|
||||
I mention `while` loops only briefly, because we hardly ever use them.
|
||||
They're most often used for simulation, which is outside the scope of this book.
|
||||
However, it is good to know they exist so that you're prepared for problems where the number of iterations is not known in advance.
|
||||
|
||||
|
@ -356,7 +356,7 @@ However, it is good to know they exist so that you're prepared for problems wher
|
|||
|
||||
1. Imagine you have a directory full of CSV files that you want to read in.
|
||||
You have their paths in a vector, `files <- dir("data/", pattern = "\\.csv$", full.names = TRUE)`, and now want to read each one with `read_csv()`.
|
||||
Write the for loop that will load them into a single data frame.
|
||||
Write the `for` loop that will load them into a single data frame.
|
||||
|
||||
2. What happens if you use `for (nm in names(x))` and `x` has no names?
|
||||
What if only some of the elements are named?
|
||||
|
@ -396,8 +396,8 @@ However, it is good to know they exist so that you're prepared for problems wher
|
|||
|
||||
## For loops vs. functionals
|
||||
|
||||
For loops are not as important in R as they are in other languages because R is a functional programming language.
|
||||
This means that it's possible to wrap up for loops in a function, and call that function instead of using the for loop directly.
|
||||
`For` loops are not as important in R as they are in other languages because R is a functional programming language.
|
||||
This means that it's possible to wrap up `for` loops in a function, and call that function instead of using the `for` loop directly.
|
||||
|
||||
To see why this is important, consider (again) this simple data frame:
|
||||
|
||||
|
@ -411,7 +411,7 @@ df <- tibble(
|
|||
```
|
||||
|
||||
Imagine you want to compute the mean of every column.
|
||||
You could do that with a for loop:
|
||||
You could do that with a `for` loop:
|
||||
|
||||
```{r}
|
||||
output <- vector("double", length(df))
|
||||
|
@ -454,7 +454,7 @@ col_sd <- function(df) {
|
|||
|
||||
Uh oh!
|
||||
You've copied-and-pasted this code twice, so it's time to think about how to generalize it.
|
||||
Notice that most of this code is for-loop boilerplate and it's hard to see the one thing (`mean()`, `median()`, `sd()`) that is different between the functions.
|
||||
Notice that most of this code is `for` loop boilerplate and it's hard to see the one thing (`mean()`, `median()`, `sd()`) that is different between the functions.
|
||||
|
||||
What would you do if you saw a set of functions like this:
|
||||
|
||||
|
@ -488,10 +488,10 @@ col_summary(df, mean)
|
|||
|
||||
The idea of passing a function to another function is an extremely powerful idea, and it's one of the behaviors that makes R a functional programming language.
|
||||
It might take you a while to wrap your head around the idea, but it's worth the investment.
|
||||
In the rest of the chapter, you'll learn about and use the **purrr** package, which provides functions that eliminate the need for many common for loops.
|
||||
In the rest of the chapter, you'll learn about and use the **purrr** package, which provides functions that eliminate the need for many common `for` loops.
|
||||
The apply family of functions in base R (`apply()`, `lapply()`, `tapply()`, etc) solve a similar problem, but purrr is more consistent and thus is easier to learn.
|
||||
|
||||
The goal of using purrr functions instead of for loops is to allow you to break common list manipulation challenges into independent pieces:
|
||||
The goal of using purrr functions instead of `for` loops is to allow you to break common list manipulation challenges into independent pieces:
|
||||
|
||||
1. How can you solve the problem for a single element of the list?
|
||||
Once you've solved that problem, purrr takes care of generalising your solution to every element in the list.
|
||||
|
@ -505,7 +505,7 @@ It also makes it easier to understand your solutions to old problems when you re
|
|||
### Exercises
|
||||
|
||||
1. Read the documentation for `apply()`.
|
||||
In the 2d case, what two for loops does it generalise?
|
||||
In the 2d case, what two `for` loops does it generalise?
|
||||
|
||||
2. Adapt `col_summary()` so that it only applies to numeric columns You might want to start with an `is_numeric()` function that returns a logical vector that has a `TRUE` corresponding to each numeric column.
|
||||
|
||||
|
@ -524,15 +524,15 @@ Each function takes a vector as input, applies a function to each piece, and the
|
|||
The type of the vector is determined by the suffix to the map function.
|
||||
|
||||
Once you master these functions, you'll find it takes much less time to solve iteration problems.
|
||||
But you should never feel bad about using a for loop instead of a map function.
|
||||
But you should never feel bad about using a `for` loop instead of a map function.
|
||||
The map functions are a step up a tower of abstraction, and it can take a long time to get your head around how they work.
|
||||
The important thing is that you solve the problem that you're working on, not write the most concise and elegant code (although that's definitely something you want to strive towards!).
|
||||
|
||||
Some people will tell you to avoid for loops because they are slow.
|
||||
Some people will tell you to avoid `for` loops because they are slow.
|
||||
They're wrong!
|
||||
(Well at least they're rather out of date, as for loops haven't been slow for many years.) The chief benefits of using functions like `map()` is not speed, but clarity: they make your code easier to write and to read.
|
||||
(Well at least they're rather out of date, as `for` loops haven't been slow for many years.) The chief benefits of using functions like `map()` is not speed, but clarity: they make your code easier to write and to read.
|
||||
|
||||
We can use these functions to perform the same computations as the last for loop.
|
||||
We can use these functions to perform the same computations as the last `for` loop.
|
||||
Those summary functions returned doubles, so we need to use `map_dbl()`:
|
||||
|
||||
```{r}
|
||||
|
@ -541,7 +541,7 @@ map_dbl(df, median)
|
|||
map_dbl(df, sd)
|
||||
```
|
||||
|
||||
Compared to using a for loop, focus is on the operation being performed (i.e. `mean()`, `median()`, `sd()`), not the bookkeeping required to loop over every element and store the output.
|
||||
Compared to using a `for` loop, focus is on the operation being performed (i.e. `mean()`, `median()`, `sd()`), not the bookkeeping required to loop over every element and store the output.
|
||||
This is even more apparent if we use the pipe:
|
||||
|
||||
```{r}
|
||||
|
@ -591,7 +591,7 @@ models <- mtcars |>
|
|||
map(~lm(mpg ~ wt, data = .x))
|
||||
```
|
||||
|
||||
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the for loop).
|
||||
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the `for` loop).
|
||||
`.x` in a one-sided formula corresponds to an argument in an anonymous function.
|
||||
|
||||
When you're looking at many models, you might want to extract a summary statistic like the $R^2$.
|
||||
|
@ -781,7 +781,7 @@ knitr::include_graphics("diagrams/lists-map2.png")
|
|||
|
||||
Note that the arguments that vary for each call come *before* the function; arguments that are the same for every call come *after*.
|
||||
|
||||
Like `map()`, `map2()` is just a wrapper around a for loop:
|
||||
Like `map()`, `map2()` is just a wrapper around a `for` loop:
|
||||
|
||||
```{r}
|
||||
map2 <- function(x, y, f, ...) {
|
||||
|
@ -881,7 +881,7 @@ This makes them suitable for use in the middle of pipelines.
|
|||
|
||||
## Other patterns of for loops
|
||||
|
||||
Purrr provides a number of other functions that abstract over other types of for loops.
|
||||
Purrr provides a number of other functions that abstract over other types of `for` loops.
|
||||
You'll use them less frequently than the map functions, but they're useful to know about.
|
||||
The goal here is to briefly illustrate each function, so hopefully it will come to mind if you see a similar problem in the future.
|
||||
Then you can go look up the documentation for more details.
|
||||
|
@ -978,7 +978,7 @@ x |> accumulate(`+`)
|
|||
|
||||
### Exercises
|
||||
|
||||
1. Implement your own version of `every()` using a for loop.
|
||||
1. Implement your own version of `every()` using a `for` loop.
|
||||
Compare it with `purrr::every()`.
|
||||
What does purrr's version do that your version doesn't?
|
||||
|
||||
|
|
Loading…
Reference in New Issue