@@ -22,14 +22,14 @@ One tool for reducing duplication is functions, which reduce duplication by iden
 | 
			
		||||
Another tool for reducing duplication is **iteration**, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets.
 | 
			
		||||
 | 
			
		||||
In this chapter you'll learn about two important iteration paradigms: **imperative** and **functional**.
 | 
			
		||||
On the imperative side you have tools like for loops and while loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening.
 | 
			
		||||
However, for loops are quite verbose because they require bookkeeping code that is duplicated for every for loop.
 | 
			
		||||
Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function.
 | 
			
		||||
On the imperative side you have tools like `for` loops and `while` loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening.
 | 
			
		||||
However, `for` loops are quite verbose because they require bookkeeping code that is duplicated for every `for` loop.
 | 
			
		||||
Functional programming (FP) offers tools to extract out this duplicated code, so each common `for` loop pattern gets its own function.
 | 
			
		||||
Once you master the vocabulary of FP, you can solve many common iteration problems with less code, more ease, and fewer errors.
 | 
			
		||||
 | 
			
		||||
### Prerequisites
 | 
			
		||||
 | 
			
		||||
Once you've mastered the for loops provided by base R, you'll learn some of the powerful programming tools provided by purrr, one of the tidyverse core packages.
 | 
			
		||||
Once you've mastered the `for` loops provided by base R, you'll learn some of the powerful programming tools provided by purrr, one of the tidyverse core packages.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| label: setup
 | 
			
		||||
@@ -62,7 +62,7 @@ median(df$d)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
But that breaks our rule of thumb: never copy and paste more than twice.
 | 
			
		||||
Instead, we could use a for loop:
 | 
			
		||||
Instead, we could use a `for` loop:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
output <- vector("double", ncol(df))  # 1. output
 | 
			
		||||
@@ -72,17 +72,17 @@ for (i in seq_along(df)) {            # 2. sequence
 | 
			
		||||
output
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Every for loop has three components:
 | 
			
		||||
Every `for` loop has three components:
 | 
			
		||||
 | 
			
		||||
1.  The **output**: `output <- vector("double", length(x))`.
 | 
			
		||||
    Before you start the loop, you must always allocate sufficient space for the output.
 | 
			
		||||
    This is very important for efficiency: if you grow the for loop at each iteration using `c()` (for example), your for loop will be very slow.
 | 
			
		||||
    This is very important for efficiency: if you grow the `for` loop at each iteration using `c()` (for example), your `for` loop will be very slow.
 | 
			
		||||
 | 
			
		||||
    A general way of creating an empty vector of given length is the `vector()` function.
 | 
			
		||||
    It has two arguments: the type of the vector ("logical", "integer", "double", "character", etc) and the length of the vector.
 | 
			
		||||
 | 
			
		||||
2.  The **sequence**: `i in seq_along(df)`.
 | 
			
		||||
    This determines what to loop over: each run of the for loop will assign `i` to a different value from `seq_along(df)`.
 | 
			
		||||
    This determines what to loop over: each run of the `for` loop will assign `i` to a different value from `seq_along(df)`.
 | 
			
		||||
    It's useful to think of `i` as a pronoun, like "it".
 | 
			
		||||
 | 
			
		||||
    You might not have seen `seq_along()` before.
 | 
			
		||||
@@ -102,13 +102,13 @@ Every for loop has three components:
 | 
			
		||||
    It's run repeatedly, each time with a different value for `i`.
 | 
			
		||||
    The first iteration will run `output[[1]] <- median(df[[1]])`, the second will run `output[[2]] <- median(df[[2]])`, and so on.
 | 
			
		||||
 | 
			
		||||
That's all there is to the for loop!
 | 
			
		||||
Now is a good time to practice creating some basic (and not so basic) for loops using the exercises below.
 | 
			
		||||
Then we'll move on to some variations of the for loop that help you solve other problems that will crop up in practice.
 | 
			
		||||
That's all there is to the `for` loop!
 | 
			
		||||
Now is a good time to practice creating some basic (and not so basic) `for` loops using the exercises below.
 | 
			
		||||
Then we'll move on to some variations of the `for` loop that help you solve other problems that will crop up in practice.
 | 
			
		||||
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  Write for loops to:
 | 
			
		||||
1.  Write `for` loops to:
 | 
			
		||||
 | 
			
		||||
    a.  Compute the mean of every column in `mtcars`.
 | 
			
		||||
    b.  Determine the type of each column in `nycflights13::flights`.
 | 
			
		||||
@@ -117,7 +117,7 @@ Then we'll move on to some variations of the for loop that help you solve other
 | 
			
		||||
 | 
			
		||||
    Think about the output, sequence, and body **before** you start writing the loop.
 | 
			
		||||
 | 
			
		||||
2.  Eliminate the for loop in each of the following examples by taking advantage of an existing function that works with vectors:
 | 
			
		||||
2.  Eliminate the `for` loop in each of the following examples by taking advantage of an existing function that works with vectors:
 | 
			
		||||
 | 
			
		||||
    ```{r}
 | 
			
		||||
    #| eval: false
 | 
			
		||||
@@ -142,13 +142,13 @@ Then we'll move on to some variations of the for loop that help you solve other
 | 
			
		||||
    }
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
3.  Combine your function writing and for loop skills:
 | 
			
		||||
3.  Combine your function writing and `for` loop skills:
 | 
			
		||||
 | 
			
		||||
    a.  Write a for loop that `prints()` the lyrics to the children's song "Alice the camel".
 | 
			
		||||
    a.  Write a `for` loop that `prints()` the lyrics to the children's song "Alice the camel".
 | 
			
		||||
    b.  Convert the nursery rhyme "ten in the bed" to a function. Generalise it to any number of people in any sleeping structure.
 | 
			
		||||
    c.  Convert the song "99 bottles of beer on the wall" to a function. Generalise to any number of any vessel containing any liquid on any surface.
 | 
			
		||||
 | 
			
		||||
4.  It's common to see for loops that don't preallocate the output and instead increase the length of a vector at each step:
 | 
			
		||||
4.  It's common to see `for` loops that don't preallocate the output and instead increase the length of a vector at each step:
 | 
			
		||||
 | 
			
		||||
    ```{r}
 | 
			
		||||
    #| eval: false
 | 
			
		||||
@@ -165,10 +165,10 @@ Then we'll move on to some variations of the for loop that help you solve other
 | 
			
		||||
 | 
			
		||||
## For loop variations
 | 
			
		||||
 | 
			
		||||
Once you have the basic for loop under your belt, there are some variations that you should be aware of.
 | 
			
		||||
Once you have the basic `for` loop under your belt, there are some variations that you should be aware of.
 | 
			
		||||
These variations are important regardless of how you do iteration, so don't forget about them once you've mastered the FP techniques you'll learn about in the next section.
 | 
			
		||||
 | 
			
		||||
There are four variations on the basic theme of the for loop:
 | 
			
		||||
There are four variations on the basic theme of the `for` loop:
 | 
			
		||||
 | 
			
		||||
1.  Modifying an existing object, instead of creating a new object.
 | 
			
		||||
2.  Looping over names or values, instead of indices.
 | 
			
		||||
@@ -177,7 +177,7 @@ There are four variations on the basic theme of the for loop:
 | 
			
		||||
 | 
			
		||||
### Modifying an existing object
 | 
			
		||||
 | 
			
		||||
Sometimes you want to use a for loop to modify an existing object.
 | 
			
		||||
Sometimes you want to use a `for` loop to modify an existing object.
 | 
			
		||||
For example, remember our challenge from [Chapter -@sec-functions] on functions.
 | 
			
		||||
We wanted to rescale every column in a data frame:
 | 
			
		||||
 | 
			
		||||
@@ -199,7 +199,7 @@ df$c <- rescale01(df$c)
 | 
			
		||||
df$d <- rescale01(df$d)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
To solve this with a for loop we again think about the three components:
 | 
			
		||||
To solve this with a `for` loop we again think about the three components:
 | 
			
		||||
 | 
			
		||||
1.  **Output**: we already have the output --- it's the same as the input!
 | 
			
		||||
 | 
			
		||||
@@ -216,7 +216,7 @@ for (i in seq_along(df)) {
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Typically you'll be modifying a list or data frame with this sort of loop, so remember to use `[[`, not `[`.
 | 
			
		||||
You might have spotted that we used `[[` in all my for loops: we think it's better to use `[[` even for atomic vectors because it makes it clear that you want to work with a single element.
 | 
			
		||||
You might have spotted that we used `[[` in all my `for` loops: we think it's better to use `[[` even for atomic vectors because it makes it clear that you want to work with a single element.
 | 
			
		||||
 | 
			
		||||
### Looping patterns
 | 
			
		||||
 | 
			
		||||
@@ -300,9 +300,9 @@ Whenever you see it, switch to a more complex result object, and then combine in
 | 
			
		||||
Sometimes you don't even know how long the input sequence should run for.
 | 
			
		||||
This is common when doing simulations.
 | 
			
		||||
For example, you might want to loop until you get three heads in a row.
 | 
			
		||||
You can't do that sort of iteration with the for loop.
 | 
			
		||||
Instead, you can use a while loop.
 | 
			
		||||
A while loop is simpler than a for loop because it only has two components, a condition and a body:
 | 
			
		||||
You can't do that sort of iteration with the `for` loop.
 | 
			
		||||
Instead, you can use a `while` loop.
 | 
			
		||||
A `while` loop is simpler than a `for` loop because it only has two components, a condition and a body:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| eval: false
 | 
			
		||||
@@ -312,7 +312,7 @@ while (condition) {
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
A while loop is also more general than a for loop, because you can rewrite any for loop as a while loop, but you can't rewrite every while loop as a for loop:
 | 
			
		||||
A `while` loop is also more general than a `for` loop, because you can rewrite any `for` loop as a `while` loop, but you can't rewrite every `while` loop as a `for` loop:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| eval: false
 | 
			
		||||
@@ -329,7 +329,7 @@ while (i <= length(x)) {
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Here's how we could use a while loop to find how many tries it takes to get three heads in a row:
 | 
			
		||||
Here's how we could use a `while` loop to find how many tries it takes to get three heads in a row:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
flip <- function() sample(c("T", "H"), 1)
 | 
			
		||||
@@ -348,7 +348,7 @@ while (nheads < 3) {
 | 
			
		||||
flips
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
I mention while loops only briefly, because we hardly ever use them.
 | 
			
		||||
I mention `while` loops only briefly, because we hardly ever use them.
 | 
			
		||||
They're most often used for simulation, which is outside the scope of this book.
 | 
			
		||||
However, it is good to know they exist so that you're prepared for problems where the number of iterations is not known in advance.
 | 
			
		||||
 | 
			
		||||
@@ -356,7 +356,7 @@ However, it is good to know they exist so that you're prepared for problems wher
 | 
			
		||||
 | 
			
		||||
1.  Imagine you have a directory full of CSV files that you want to read in.
 | 
			
		||||
    You have their paths in a vector, `files <- dir("data/", pattern = "\\.csv$", full.names = TRUE)`, and now want to read each one with `read_csv()`.
 | 
			
		||||
    Write the for loop that will load them into a single data frame.
 | 
			
		||||
    Write the `for` loop that will load them into a single data frame.
 | 
			
		||||
 | 
			
		||||
2.  What happens if you use `for (nm in names(x))` and `x` has no names?
 | 
			
		||||
    What if only some of the elements are named?
 | 
			
		||||
@@ -396,8 +396,8 @@ However, it is good to know they exist so that you're prepared for problems wher
 | 
			
		||||
 | 
			
		||||
## For loops vs. functionals
 | 
			
		||||
 | 
			
		||||
For loops are not as important in R as they are in other languages because R is a functional programming language.
 | 
			
		||||
This means that it's possible to wrap up for loops in a function, and call that function instead of using the for loop directly.
 | 
			
		||||
`For` loops are not as important in R as they are in other languages because R is a functional programming language.
 | 
			
		||||
This means that it's possible to wrap up `for` loops in a function, and call that function instead of using the `for` loop directly.
 | 
			
		||||
 | 
			
		||||
To see why this is important, consider (again) this simple data frame:
 | 
			
		||||
 | 
			
		||||
@@ -411,7 +411,7 @@ df <- tibble(
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Imagine you want to compute the mean of every column.
 | 
			
		||||
You could do that with a for loop:
 | 
			
		||||
You could do that with a `for` loop:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
output <- vector("double", length(df))
 | 
			
		||||
@@ -454,7 +454,7 @@ col_sd <- function(df) {
 | 
			
		||||
 | 
			
		||||
Uh oh!
 | 
			
		||||
You've copied-and-pasted this code twice, so it's time to think about how to generalize it.
 | 
			
		||||
Notice that most of this code is for-loop boilerplate and it's hard to see the one thing (`mean()`, `median()`, `sd()`) that is different between the functions.
 | 
			
		||||
Notice that most of this code is `for` loop boilerplate and it's hard to see the one thing (`mean()`, `median()`, `sd()`) that is different between the functions.
 | 
			
		||||
 | 
			
		||||
What would you do if you saw a set of functions like this:
 | 
			
		||||
 | 
			
		||||
@@ -488,10 +488,10 @@ col_summary(df, mean)
 | 
			
		||||
 | 
			
		||||
The idea of passing a function to another function is an extremely powerful idea, and it's one of the behaviors that makes R a functional programming language.
 | 
			
		||||
It might take you a while to wrap your head around the idea, but it's worth the investment.
 | 
			
		||||
In the rest of the chapter, you'll learn about and use the **purrr** package, which provides functions that eliminate the need for many common for loops.
 | 
			
		||||
In the rest of the chapter, you'll learn about and use the **purrr** package, which provides functions that eliminate the need for many common `for` loops.
 | 
			
		||||
The apply family of functions in base R (`apply()`, `lapply()`, `tapply()`, etc) solve a similar problem, but purrr is more consistent and thus is easier to learn.
 | 
			
		||||
 | 
			
		||||
The goal of using purrr functions instead of for loops is to allow you to break common list manipulation challenges into independent pieces:
 | 
			
		||||
The goal of using purrr functions instead of `for` loops is to allow you to break common list manipulation challenges into independent pieces:
 | 
			
		||||
 | 
			
		||||
1.  How can you solve the problem for a single element of the list?
 | 
			
		||||
    Once you've solved that problem, purrr takes care of generalising your solution to every element in the list.
 | 
			
		||||
@@ -505,7 +505,7 @@ It also makes it easier to understand your solutions to old problems when you re
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  Read the documentation for `apply()`.
 | 
			
		||||
    In the 2d case, what two for loops does it generalise?
 | 
			
		||||
    In the 2d case, what two `for` loops does it generalise?
 | 
			
		||||
 | 
			
		||||
2.  Adapt `col_summary()` so that it only applies to numeric columns You might want to start with an `is_numeric()` function that returns a logical vector that has a `TRUE` corresponding to each numeric column.
 | 
			
		||||
 | 
			
		||||
@@ -524,15 +524,15 @@ Each function takes a vector as input, applies a function to each piece, and the
 | 
			
		||||
The type of the vector is determined by the suffix to the map function.
 | 
			
		||||
 | 
			
		||||
Once you master these functions, you'll find it takes much less time to solve iteration problems.
 | 
			
		||||
But you should never feel bad about using a for loop instead of a map function.
 | 
			
		||||
But you should never feel bad about using a `for` loop instead of a map function.
 | 
			
		||||
The map functions are a step up a tower of abstraction, and it can take a long time to get your head around how they work.
 | 
			
		||||
The important thing is that you solve the problem that you're working on, not write the most concise and elegant code (although that's definitely something you want to strive towards!).
 | 
			
		||||
 | 
			
		||||
Some people will tell you to avoid for loops because they are slow.
 | 
			
		||||
Some people will tell you to avoid `for` loops because they are slow.
 | 
			
		||||
They're wrong!
 | 
			
		||||
(Well at least they're rather out of date, as for loops haven't been slow for many years.) The chief benefits of using functions like `map()` is not speed, but clarity: they make your code easier to write and to read.
 | 
			
		||||
(Well at least they're rather out of date, as `for` loops haven't been slow for many years.) The chief benefits of using functions like `map()` is not speed, but clarity: they make your code easier to write and to read.
 | 
			
		||||
 | 
			
		||||
We can use these functions to perform the same computations as the last for loop.
 | 
			
		||||
We can use these functions to perform the same computations as the last `for` loop.
 | 
			
		||||
Those summary functions returned doubles, so we need to use `map_dbl()`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -541,7 +541,7 @@ map_dbl(df, median)
 | 
			
		||||
map_dbl(df, sd)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Compared to using a for loop, focus is on the operation being performed (i.e. `mean()`, `median()`, `sd()`), not the bookkeeping required to loop over every element and store the output.
 | 
			
		||||
Compared to using a `for` loop, focus is on the operation being performed (i.e. `mean()`, `median()`, `sd()`), not the bookkeeping required to loop over every element and store the output.
 | 
			
		||||
This is even more apparent if we use the pipe:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -591,7 +591,7 @@ models <- mtcars |>
 | 
			
		||||
  map(~lm(mpg ~ wt, data = .x))
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the for loop).
 | 
			
		||||
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the `for` loop).
 | 
			
		||||
`.x` in a one-sided formula corresponds to an argument in an anonymous function.
 | 
			
		||||
 | 
			
		||||
When you're looking at many models, you might want to extract a summary statistic like the $R^2$.
 | 
			
		||||
@@ -781,7 +781,7 @@ knitr::include_graphics("diagrams/lists-map2.png")
 | 
			
		||||
 | 
			
		||||
Note that the arguments that vary for each call come *before* the function; arguments that are the same for every call come *after*.
 | 
			
		||||
 | 
			
		||||
Like `map()`, `map2()` is just a wrapper around a for loop:
 | 
			
		||||
Like `map()`, `map2()` is just a wrapper around a `for` loop:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
map2 <- function(x, y, f, ...) {
 | 
			
		||||
@@ -881,7 +881,7 @@ This makes them suitable for use in the middle of pipelines.
 | 
			
		||||
 | 
			
		||||
## Other patterns of for loops
 | 
			
		||||
 | 
			
		||||
Purrr provides a number of other functions that abstract over other types of for loops.
 | 
			
		||||
Purrr provides a number of other functions that abstract over other types of `for` loops.
 | 
			
		||||
You'll use them less frequently than the map functions, but they're useful to know about.
 | 
			
		||||
The goal here is to briefly illustrate each function, so hopefully it will come to mind if you see a similar problem in the future.
 | 
			
		||||
Then you can go look up the documentation for more details.
 | 
			
		||||
@@ -978,7 +978,7 @@ x |> accumulate(`+`)
 | 
			
		||||
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  Implement your own version of `every()` using a for loop.
 | 
			
		||||
1.  Implement your own version of `every()` using a `for` loop.
 | 
			
		||||
    Compare it with `purrr::every()`.
 | 
			
		||||
    What does purrr's version do that your version doesn't?
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user