Polish pipes chapter
This commit is contained in:
parent
6376a68ebf
commit
1b2a1b4b35
|
@ -6,80 +6,105 @@ status("restructuring")
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Pipes are a powerful tool for clearly expressing a sequence of multiple operations.
|
The pipe, `|>` is a powerful tool for clearly expressing a sequence of multiple operations.
|
||||||
We briefly introduced them in the previous chapter but before going too much farther I wanted to explain a little more about how they work and give a splash of history.
|
We briefly introduced them in the previous chapter but before going too much farther I wanted to give a little more motivation, discuss another important pipe (`%>%`), and discuss one challenge of the pipe.
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
|
|
||||||
The pipe `|>` is built into R itself so you don't need anything else 😄.
|
|
||||||
But we'll also discuss another historically important pipe, `%>%`, which is provided by the core tidyverse package magrittr.
|
|
||||||
|
|
||||||
```{r setup, message = FALSE}
|
|
||||||
library(tidyverse)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Why use a pipe?
|
## Why use a pipe?
|
||||||
|
|
||||||
The point of the pipe is to help you write code in a way that is easier to read and understand.
|
Each individual dplyr function is quite simple, so to solve complex problems you'll typically need to combine multiple verbs together.
|
||||||
Imagine you wanted to express the following sequence of actions as R code: find keys, unlock car, start car, drive to work, park.
|
The end of the last chapter finished with a moderately complex pipe:
|
||||||
You could write it as nested function calls:
|
|
||||||
|
|
||||||
```{r, eval = FALSE}
|
```{r, eval = FALSE}
|
||||||
park(drive(start_car(find("keys")), to = "work"))
|
flights |>
|
||||||
|
filter(!is.na(arr_delay), !is.na(tailnum)) |>
|
||||||
|
group_by(tailnum) |>
|
||||||
|
summarise(
|
||||||
|
delay = mean(arr_delay, na.rm = TRUE),
|
||||||
|
n = n()
|
||||||
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
But writing it out using with the pipe gives it a more natural and easier to read structure:
|
Even though this pipe has four steps, it quites easy to skim to get the main meaning: we start with flights, then filter, then group, then summarize.
|
||||||
|
|
||||||
|
What would happen if we didn't have the pipe?
|
||||||
|
We can still solve this same problem but we'd need to nest each function call inside the previous:
|
||||||
|
|
||||||
```{r, eval = FALSE}
|
```{r, eval = FALSE}
|
||||||
find("keys") |>
|
summarise(
|
||||||
start_car() |>
|
group_by(
|
||||||
drive(to = "work") |>
|
filter(
|
||||||
park()
|
flights,
|
||||||
|
!is.na(arr_delay), !is.na(tailnum)
|
||||||
|
),
|
||||||
|
tailnum
|
||||||
|
),
|
||||||
|
delay = mean(arr_delay, na.rm = TRUE
|
||||||
|
),
|
||||||
|
n = n()
|
||||||
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
Behind the scenes, the pipe actually transforms your code to the first form.
|
Or use a bunch of intermediate variables:
|
||||||
In other words, `x |> f(y)` is equivalent to `f(x, y)`.
|
|
||||||
|
```{r, eval = FALSE}
|
||||||
|
flights1 <- filter(flights, !is.na(arr_delay), !is.na(tailnum))
|
||||||
|
flights2 <- group_by(flights1, tailnum)
|
||||||
|
flights3 <- summarise(flight2,
|
||||||
|
delay = mean(arr_delay, na.rm = TRUE),
|
||||||
|
n = n()
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
While both of these forms have their uses, the pipe generally produces code that is easier to read and easier to write.
|
||||||
|
|
||||||
## magrittr and the `%>%` pipe
|
## magrittr and the `%>%` pipe
|
||||||
|
|
||||||
If you've been using the tidyverse for a while, you might be more familiar with `%>%` than `|>`.
|
If you've been using the tidyverse for a while, you might have been be more familiar with the `%>%` pipe provided by the **magrittr** package by Stefan Milton Bache.
|
||||||
`%>%` comes from the **magrittr** package by Stefan Milton Bache and has been available since 2014.
|
The magrittr package is included in the code the tidyverse, so you can use `%>%` whenever you use the tidyverse:
|
||||||
This pipe was so successful that in 2021 the base pipe, `|>`, added to R 4.1.0.
|
|
||||||
|
|
||||||
`|>` is inspired by `%>%`, and the tidyverse team was involved in its design.
|
```{r, message = FALSE}
|
||||||
`|>` offers fewer features than `%>%`, but we largely believe this to be a feature.
|
library(tidyverse)
|
||||||
`%>%` was an experiment and included many speculative features that seemed like a good idea at the time, but in hindsight added too much complexity relative to their advantages.
|
|
||||||
The development of the base pipe gave an us opportunity to reset back to the most useful core.
|
|
||||||
|
|
||||||
## Changing the argument
|
mtcars %>%
|
||||||
|
group_by(cyl) %>%
|
||||||
There is one feature that `%>%` has that `|>` currently lacks: a very easy way to change which argument you pass the object to --- you just put a `.` where you want the object on the left of the pipe to go.
|
summarise(n = n())
|
||||||
Ironically this is particularly important for many base functions which were designed well before the pipe existed.
|
|
||||||
|
|
||||||
One particularly challenging example is extract a single column out of a data frame with `$`.
|
|
||||||
With `%>%` you can write the fairly straightforward:
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
mtcars %>% .$cyl
|
|
||||||
```
|
```
|
||||||
|
|
||||||
But the base pipe requires the rather cryptic:
|
For simple cases `|>` and `%>%` behave identically.
|
||||||
|
So why do we recommend the base pipe?
|
||||||
|
Firstly, because it's part of base R, it's always available for you to use, even when you're not using the tidyverse.
|
||||||
|
Secondly, the `|>` is quite a bit simpler than the magrittr pipe.
|
||||||
|
In the 7 years between the invention of `%>%` in 2014 and the inclusion of `|>` in R 4.1.0 in 2021, we honed in the core strength of the pipe, allowing the base implementation to jettison to estoeric and relatively unimportant features.
|
||||||
|
|
||||||
```{r}
|
### Key differences
|
||||||
mtcars |> (`$`)(cyl)
|
|
||||||
```
|
|
||||||
|
|
||||||
Fortunately, dplyr provides a way out of this common problem with `pull`:
|
If you haven't used `%>%` you can skip this section; if you have, read on to learn about the most important differences.
|
||||||
|
|
||||||
```{r}
|
- `%>%` allows you to use `.` as a placeholder to control how the object on the left is passed to the function on the right.
|
||||||
mtcars |> pull(cyl)
|
R 4.2.0 will bring a `_` as a placeholder with the additional restriction that it must be named.
|
||||||
```
|
|
||||||
|
|
||||||
magrittr offers a number of other variations on the pipe that you might want to learn about.
|
- The base pipe `|>` doesn't support any of the more complex uses of `.` such as passing `.` to more than one argument, or the special behavior when used with `.`.
|
||||||
We don't teach them here because none of them has been sufficiently popular that you could reasonable expect a randomly chosen R user to recognize them.
|
|
||||||
|
|
||||||
In R 4.2, the base pipe will gain its own placeholder, `_`.
|
- The base pipe doesn't yet provide a convenient way to use `$` (and similar functions).
|
||||||
Must be named.
|
With magrittr, you can write:
|
||||||
Doesn't solve problem above, but helps out in lots of other places.
|
|
||||||
|
|
||||||
Expect it to continue to evolve.
|
```{r}
|
||||||
|
mtcars %>% .$cyl
|
||||||
|
```
|
||||||
|
|
||||||
|
With the base pipe you instead need the rather cryptic:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
mtcars |> (`$`)(cyl)
|
||||||
|
```
|
||||||
|
|
||||||
|
Fortunately, you can instead use `dplyr::pull():`
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
mtcars |> pull(cyl)
|
||||||
|
```
|
||||||
|
|
||||||
|
- When calling a function with no argument, you could drop the parenthesis, and write (e.g.) `x %>% ungroup`.
|
||||||
|
The parenthesis are always required with `|>`.
|
||||||
|
|
||||||
|
- Starting a pipe with `.`, like `. %>% group_by(x) %>% summarise(x)` would create a function rather than immediately performing the pipe.
|
||||||
|
|
Loading…
Reference in New Issue