Polish pipes chapter

This commit is contained in:
Hadley Wickham 2022-02-16 15:51:27 -06:00
parent 6376a68ebf
commit 1b2a1b4b35
1 changed files with 78 additions and 53 deletions

View File

@ -6,80 +6,105 @@ status("restructuring")
## Introduction ## Introduction
Pipes are a powerful tool for clearly expressing a sequence of multiple operations. The pipe, `|>` is a powerful tool for clearly expressing a sequence of multiple operations.
We briefly introduced them in the previous chapter but before going too much farther I wanted to explain a little more about how they work and give a splash of history. We briefly introduced them in the previous chapter but before going too much farther I wanted to give a little more motivation, discuss another important pipe (`%>%`), and discuss one challenge of the pipe.
### Prerequisites
The pipe `|>` is built into R itself so you don't need anything else 😄.
But we'll also discuss another historically important pipe, `%>%`, which is provided by the core tidyverse package magrittr.
```{r setup, message = FALSE}
library(tidyverse)
```
## Why use a pipe? ## Why use a pipe?
The point of the pipe is to help you write code in a way that is easier to read and understand. Each individual dplyr function is quite simple, so to solve complex problems you'll typically need to combine multiple verbs together.
Imagine you wanted to express the following sequence of actions as R code: find keys, unlock car, start car, drive to work, park. The end of the last chapter finished with a moderately complex pipe:
You could write it as nested function calls:
```{r, eval = FALSE} ```{r, eval = FALSE}
park(drive(start_car(find("keys")), to = "work")) flights |>
filter(!is.na(arr_delay), !is.na(tailnum)) |>
group_by(tailnum) |>
summarise(
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
``` ```
But writing it out using with the pipe gives it a more natural and easier to read structure: Even though this pipe has four steps, it quites easy to skim to get the main meaning: we start with flights, then filter, then group, then summarize.
What would happen if we didn't have the pipe?
We can still solve this same problem but we'd need to nest each function call inside the previous:
```{r, eval = FALSE} ```{r, eval = FALSE}
find("keys") |> summarise(
start_car() |> group_by(
drive(to = "work") |> filter(
park() flights,
!is.na(arr_delay), !is.na(tailnum)
),
tailnum
),
delay = mean(arr_delay, na.rm = TRUE
),
n = n()
)
``` ```
Behind the scenes, the pipe actually transforms your code to the first form. Or use a bunch of intermediate variables:
In other words, `x |> f(y)` is equivalent to `f(x, y)`.
```{r, eval = FALSE}
flights1 <- filter(flights, !is.na(arr_delay), !is.na(tailnum))
flights2 <- group_by(flights1, tailnum)
flights3 <- summarise(flight2,
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
```
While both of these forms have their uses, the pipe generally produces code that is easier to read and easier to write.
## magrittr and the `%>%` pipe ## magrittr and the `%>%` pipe
If you've been using the tidyverse for a while, you might be more familiar with `%>%` than `|>`. If you've been using the tidyverse for a while, you might have been be more familiar with the `%>%` pipe provided by the **magrittr** package by Stefan Milton Bache.
`%>%` comes from the **magrittr** package by Stefan Milton Bache and has been available since 2014. The magrittr package is included in the code the tidyverse, so you can use `%>%` whenever you use the tidyverse:
This pipe was so successful that in 2021 the base pipe, `|>`, added to R 4.1.0.
`|>` is inspired by `%>%`, and the tidyverse team was involved in its design. ```{r, message = FALSE}
`|>` offers fewer features than `%>%`, but we largely believe this to be a feature. library(tidyverse)
`%>%` was an experiment and included many speculative features that seemed like a good idea at the time, but in hindsight added too much complexity relative to their advantages.
The development of the base pipe gave an us opportunity to reset back to the most useful core.
## Changing the argument mtcars %>%
group_by(cyl) %>%
There is one feature that `%>%` has that `|>` currently lacks: a very easy way to change which argument you pass the object to --- you just put a `.` where you want the object on the left of the pipe to go. summarise(n = n())
Ironically this is particularly important for many base functions which were designed well before the pipe existed.
One particularly challenging example is extract a single column out of a data frame with `$`.
With `%>%` you can write the fairly straightforward:
```{r}
mtcars %>% .$cyl
``` ```
But the base pipe requires the rather cryptic: For simple cases `|>` and `%>%` behave identically.
So why do we recommend the base pipe?
Firstly, because it's part of base R, it's always available for you to use, even when you're not using the tidyverse.
Secondly, the `|>` is quite a bit simpler than the magrittr pipe.
In the 7 years between the invention of `%>%` in 2014 and the inclusion of `|>` in R 4.1.0 in 2021, we honed in the core strength of the pipe, allowing the base implementation to jettison to estoeric and relatively unimportant features.
```{r} ### Key differences
mtcars |> (`$`)(cyl)
```
Fortunately, dplyr provides a way out of this common problem with `pull`: If you haven't used `%>%` you can skip this section; if you have, read on to learn about the most important differences.
```{r} - `%>%` allows you to use `.` as a placeholder to control how the object on the left is passed to the function on the right.
mtcars |> pull(cyl) R 4.2.0 will bring a `_` as a placeholder with the additional restriction that it must be named.
```
magrittr offers a number of other variations on the pipe that you might want to learn about. - The base pipe `|>` doesn't support any of the more complex uses of `.` such as passing `.` to more than one argument, or the special behavior when used with `.`.
We don't teach them here because none of them has been sufficiently popular that you could reasonable expect a randomly chosen R user to recognize them.
In R 4.2, the base pipe will gain its own placeholder, `_`. - The base pipe doesn't yet provide a convenient way to use `$` (and similar functions).
Must be named. With magrittr, you can write:
Doesn't solve problem above, but helps out in lots of other places.
Expect it to continue to evolve. ```{r}
mtcars %>% .$cyl
```
With the base pipe you instead need the rather cryptic:
```{r}
mtcars |> (`$`)(cyl)
```
Fortunately, you can instead use `dplyr::pull():`
```{r}
mtcars |> pull(cyl)
```
- When calling a function with no argument, you could drop the parenthesis, and write (e.g.) `x %>% ungroup`.
The parenthesis are always required with `|>`.
- Starting a pipe with `.`, like `. %>% group_by(x) %>% summarise(x)` would create a function rather than immediately performing the pipe.