Suggestions for chapters on logical vectors and numbers (#1212)

This commit is contained in:
Stephan Koenig 2023-01-04 13:31:49 -08:00 committed by GitHub
parent b5d6735959
commit 69b2a265a8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 6 deletions

View File

@ -480,7 +480,7 @@ Instead, you can switch to `dplyr::case_when()`.
### `case_when()`
dplyr's `case_when()` is inspired by SQL's `CASE` statement and provides a flexible way of performing different computations for different computations.
dplyr's `case_when()` is inspired by SQL's `CASE` statement and provides a flexible way of performing different computations for different conditions.
It has a special syntax that unfortunately looks like nothing else you'll use in the tidyverse.
It takes pairs that look like `condition ~ output`.
`condition` must be a logical vector; when it's `TRUE`, `output` will be used.

View File

@ -18,6 +18,11 @@ We'll finish off by covering the summary functions that pair well with `summariz
### Prerequisites
::: callout-important
This chapter relies on features only found in dplyr 1.1.0, which is still in development.
If you want to live on the edge, you can get the dev versions with `devtools::install_github("tidyverse/dplyr")`.
:::
This chapter mostly uses functions from base R, which are available without loading any packages.
But we still need the tidyverse because we'll use these base R functions inside of tidyverse functions like `mutate()` and `filter()`.
Like in the last chapter, we'll use real examples from nycflights13, as well as toy examples made with `c()` and `tribble()`.
@ -395,7 +400,7 @@ cut(y, breaks = c(0, 5, 10, 15, 20))
See the documentation for other useful arguments like `right` and `include.lowest`, which control if the intervals are `[a, b)` or `(a, b]` and if the lowest interval should be `[a, b]`.
### Cumulative and rolling aggregates
### Cumulative and rolling aggregates {#sec-cumulative-and-rolling-aggregates}
Base R provides `cumsum()`, `cumprod()`, `cummin()`, `cummax()` for running, or cumulative, sums, products, mins and maxes.
dplyr provides `cummean()` for cumulative means.
@ -544,16 +549,15 @@ events
```
But how do we go from that logical vector to something that we can `group_by()`?
`consecutive_id()` comes to the rescue:
`cumsum()` from @sec-cumulative-and-rolling-aggregates comes to the rescue as each occurring gap, i.e., `gap` is `TRUE`, increments `group` by one (see @sec-numeric-summaries-of-logicals on the numerical interpretation of logicals):
```{r}
events |> mutate(
group = consecutive_id(gap)
group = cumsum(gap)
)
```
`consecutive_id()` starts a new group every time one of its arguments changes.
That makes it useful both here, with logical vectors, and in many other place.
Another approach for creating grouping variables is `consecutive_id()`, which starts a new group every time one of its arguments changes.
For example, inspired by [this stackoverflow question](https://stackoverflow.com/questions/27482712), imagine you have a data frame with a bunch of repeated values:
```{r}