Implement suggestions and typo fixes, closes #1028
This commit is contained in:
parent
7618f2e3cb
commit
ff05d630b8
28
logicals.qmd
28
logicals.qmd
|
@ -14,13 +14,13 @@ Logical vectors are the simplest type of vector because each element can only be
|
|||
It's relatively rare to find logical vectors in your raw data, but you'll create and manipulate in the course of almost every analysis.
|
||||
|
||||
We'll begin by discussing the most common way of creating logical vectors: with numeric comparisons.
|
||||
Then you'll learn about how you can use use Boolean algebra to combine different logical vectors, as well some useful summaries.
|
||||
Then you'll learn about how you can use Boolean algebra to combine different logical vectors, as well as some useful summaries.
|
||||
We'll finish off with some tools for making conditional changes, and a cool hack for turning logical vectors into groups.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Most of the functions you'll learn about in this chapter are provided by base R, so we don't need the tidyverse, but but we'll still load it so we can use `mutate()`, `filter()`, and friends to work with data frames.
|
||||
We'll also continue to draw examples from the nyclights13 dataset.
|
||||
Most of the functions you'll learn about in this chapter are provided by base R, so we don't need the tidyverse, but we'll still load it so we can use `mutate()`, `filter()`, and friends to work with data frames.
|
||||
We'll also continue to draw examples from the nycflights13 dataset.
|
||||
|
||||
```{r}
|
||||
#| label: setup
|
||||
|
@ -38,7 +38,7 @@ x <- c(1, 2, 3, 5, 7, 11, 13)
|
|||
x * 2
|
||||
```
|
||||
|
||||
This makes it easier to explain individual functions at the cost to making it harder to see how it might apply to your data problems.
|
||||
This makes it easier to explain individual functions at the cost of making it harder to see how it might apply to your data problems.
|
||||
Just remember that any manipulation we do to a free-floating vector, you can do to a variable inside data frame with `mutate()` and friends.
|
||||
|
||||
```{r}
|
||||
|
@ -50,7 +50,7 @@ df |>
|
|||
## Comparisons
|
||||
|
||||
A very common way to create a logical vector is via a numeric comparison with `<`, `<=`, `>`, `>=`, `!=`, and `==`.
|
||||
So far, we've mostly create logical variables transiently within `filter()` --- they are computed, used, and then throw away.
|
||||
So far, we've mostly created logical variables transiently within `filter()` --- they are computed, used, and then thrown away.
|
||||
For example, the following filter finds all daytime departures that leave roughly on time:
|
||||
|
||||
```{r}
|
||||
|
@ -134,7 +134,7 @@ The most confusing result is this one:
|
|||
NA == NA
|
||||
```
|
||||
|
||||
It's easiest to understand why this is true if we artificial supply a little more context:
|
||||
It's easiest to understand why this is true if we artificially supply a little more context:
|
||||
|
||||
```{r}
|
||||
# Let x be Mary's age. We don't know how old she is.
|
||||
|
@ -392,13 +392,13 @@ flights |>
|
|||
filter(arr_delay > 0) |>
|
||||
group_by(year, month, day) |>
|
||||
summarise(
|
||||
ahead = mean(arr_delay),
|
||||
behind = mean(arr_delay),
|
||||
n = n(),
|
||||
.groups = "drop"
|
||||
)
|
||||
```
|
||||
|
||||
This works, but what if we wanted to also compute the average delay for flights that left early?
|
||||
This works, but what if we wanted to also compute the average delay for flights that arrived early?
|
||||
We'd need to perform a separate filter step, and then figure out how to combine the two data frames together[^logicals-3].
|
||||
Instead you could use `[` to perform an inline filtering: `arr_delay[arr_delay > 0]` will yield only the positive arrival delays.
|
||||
|
||||
|
@ -410,8 +410,8 @@ This leads to:
|
|||
flights |>
|
||||
group_by(year, month, day) |>
|
||||
summarise(
|
||||
ahead = mean(arr_delay[arr_delay > 0], na.rm = TRUE),
|
||||
behind = mean(arr_delay[arr_delay < 0], na.rm = TRUE),
|
||||
behind = mean(arr_delay[arr_delay > 0], na.rm = TRUE),
|
||||
ahead = mean(arr_delay[arr_delay < 0], na.rm = TRUE),
|
||||
n = n(),
|
||||
.groups = "drop"
|
||||
)
|
||||
|
@ -468,7 +468,7 @@ if_else(is.na(x1), y1, x1)
|
|||
```
|
||||
|
||||
You might have noticed a small infelicity in our labeling: zero is neither positive nor negative.
|
||||
We could resolves this by adding an additional `if_else():`
|
||||
We could resolve this by adding an additional `if_else():`
|
||||
|
||||
```{r}
|
||||
if_else(x == 0, "0", if_else(x < 0, "-ve", "+ve"), "???")
|
||||
|
@ -481,7 +481,7 @@ Instead, you can switch to `dplyr::case_when()`.
|
|||
|
||||
dplyr's `case_when()` is inspired by SQL's `CASE` statement and provides a flexible way of performing different computations for different computations.
|
||||
It has a special syntax that unfortunately looks like nothing else you'll use in the tidyverse.
|
||||
it takes pairs that look like `condition ~ output`.
|
||||
It takes pairs that look like `condition ~ output`.
|
||||
`condition` must be a logical vector; when it's `TRUE`, `output` will be used.
|
||||
|
||||
This means we could recreate our previous nested `if_else()` as follows:
|
||||
|
@ -521,7 +521,7 @@ And note that if multiple conditions match, only the first will be used:
|
|||
|
||||
```{r}
|
||||
case_when(
|
||||
x > 0 ~ "-ve",
|
||||
x > 0 ~ "+ve",
|
||||
x > 3 ~ "big"
|
||||
)
|
||||
```
|
||||
|
@ -549,7 +549,7 @@ flights |>
|
|||
Before we move on to the next chapter, I want to show you one last trick.
|
||||
I don't know exactly how to describe it, and it feels a little magical, but it's super handy so I wanted to make sure you knew about it.
|
||||
Sometimes you want to divide your dataset up into groups based on the occurrence of some event.
|
||||
For example, when you're looking at website data it's common to want to break up events into sessions, where a session is defined an a gap of more than x minutes since the last activity.
|
||||
For example, when you're looking at website data, it's common to want to break up events into sessions, where a session is defined as a gap of more than x minutes since the last activity.
|
||||
|
||||
Here's some made up data that illustrates the problem.
|
||||
I've computed the time lag between the events, and figured out if there's a gap that's big enough to qualify.
|
||||
|
|
Loading…
Reference in New Issue