111 lines
3.8 KiB
Plaintext
111 lines
3.8 KiB
Plaintext
# Workflow: Pipes {#workflow-pipes}
|
|
|
|
```{r, results = "asis", echo = FALSE}
|
|
status("restructuring")
|
|
```
|
|
|
|
## Introduction
|
|
|
|
The pipe, `|>` is a powerful tool for clearly expressing a sequence of multiple operations.
|
|
We briefly introduced them in the previous chapter but before going too much farther I wanted to give a little more motivation, discuss another important pipe (`%>%`), and discuss one challenge of the pipe.
|
|
|
|
## Why use a pipe?
|
|
|
|
Each individual dplyr function is quite simple, so to solve complex problems you'll typically need to combine multiple verbs together.
|
|
The end of the last chapter finished with a moderately complex pipe:
|
|
|
|
```{r, eval = FALSE}
|
|
flights |>
|
|
filter(!is.na(arr_delay), !is.na(tailnum)) |>
|
|
group_by(tailnum) |>
|
|
summarise(
|
|
delay = mean(arr_delay, na.rm = TRUE),
|
|
n = n()
|
|
)
|
|
```
|
|
|
|
Even though this pipe has four steps, it quites easy to skim to get the main meaning: we start with flights, then filter, then group, then summarize.
|
|
|
|
What would happen if we didn't have the pipe?
|
|
We can still solve this same problem but we'd need to nest each function call inside the previous:
|
|
|
|
```{r, eval = FALSE}
|
|
summarise(
|
|
group_by(
|
|
filter(
|
|
flights,
|
|
!is.na(arr_delay), !is.na(tailnum)
|
|
),
|
|
tailnum
|
|
),
|
|
delay = mean(arr_delay, na.rm = TRUE
|
|
),
|
|
n = n()
|
|
)
|
|
```
|
|
|
|
Or use a bunch of intermediate variables:
|
|
|
|
```{r, eval = FALSE}
|
|
flights1 <- filter(flights, !is.na(arr_delay), !is.na(tailnum))
|
|
flights2 <- group_by(flights1, tailnum)
|
|
flights3 <- summarise(flight2,
|
|
delay = mean(arr_delay, na.rm = TRUE),
|
|
n = n()
|
|
)
|
|
```
|
|
|
|
While both of these forms have their uses, the pipe generally produces code that is easier to read and easier to write.
|
|
|
|
## magrittr and the `%>%` pipe
|
|
|
|
If you've been using the tidyverse for a while, you might have been be more familiar with the `%>%` pipe provided by the **magrittr** package by Stefan Milton Bache.
|
|
The magrittr package is included in the code the tidyverse, so you can use `%>%` whenever you use the tidyverse:
|
|
|
|
```{r, message = FALSE}
|
|
library(tidyverse)
|
|
|
|
mtcars %>%
|
|
group_by(cyl) %>%
|
|
summarise(n = n())
|
|
```
|
|
|
|
For simple cases `|>` and `%>%` behave identically.
|
|
So why do we recommend the base pipe?
|
|
Firstly, because it's part of base R, it's always available for you to use, even when you're not using the tidyverse.
|
|
Secondly, the `|>` is quite a bit simpler than the magrittr pipe.
|
|
In the 7 years between the invention of `%>%` in 2014 and the inclusion of `|>` in R 4.1.0 in 2021, we honed in the core strength of the pipe, allowing the base implementation to jettison to estoeric and relatively unimportant features.
|
|
|
|
### Key differences
|
|
|
|
If you haven't used `%>%` you can skip this section; if you have, read on to learn about the most important differences.
|
|
|
|
- `%>%` allows you to use `.` as a placeholder to control how the object on the left is passed to the function on the right.
|
|
R 4.2.0 will bring a `_` as a placeholder with the additional restriction that it must be named.
|
|
|
|
- The base pipe `|>` doesn't support any of the more complex uses of `.` such as passing `.` to more than one argument, or the special behavior when used with `.`.
|
|
|
|
- The base pipe doesn't yet provide a convenient way to use `$` (and similar functions).
|
|
With magrittr, you can write:
|
|
|
|
```{r}
|
|
mtcars %>% .$cyl
|
|
```
|
|
|
|
With the base pipe you instead need the rather cryptic:
|
|
|
|
```{r}
|
|
mtcars |> (`$`)(cyl)
|
|
```
|
|
|
|
Fortunately, you can instead use `dplyr::pull():`
|
|
|
|
```{r}
|
|
mtcars |> pull(cyl)
|
|
```
|
|
|
|
- When calling a function with no argument, you could drop the parenthesis, and write (e.g.) `x %>% ungroup`.
|
|
The parenthesis are always required with `|>`.
|
|
|
|
- Starting a pipe with `.`, like `. %>% group_by(x) %>% summarise(x)` would create a function rather than immediately performing the pipe.
|