Update logicals.qmd (#1166)
This commit is contained in:
parent
5750b23270
commit
bfaa1414f5
38
logicals.qmd
38
logicals.qmd
|
@ -11,7 +11,7 @@ status("polishing")
|
|||
|
||||
In this chapter, you'll learn tools for working with logical vectors.
|
||||
Logical vectors are the simplest type of vector because each element can only be one of three possible values: `TRUE`, `FALSE`, and `NA`.
|
||||
It's relatively rare to find logical vectors in your raw data, but you'll create and manipulate in the course of almost every analysis.
|
||||
It's relatively rare to find logical vectors in your raw data, but you'll create and manipulate them in the course of almost every analysis.
|
||||
|
||||
We'll begin by discussing the most common way of creating logical vectors: with numeric comparisons.
|
||||
Then you'll learn about how you can use Boolean algebra to combine different logical vectors, as well as some useful summaries.
|
||||
|
@ -39,7 +39,7 @@ x * 2
|
|||
```
|
||||
|
||||
This makes it easier to explain individual functions at the cost of making it harder to see how it might apply to your data problems.
|
||||
Just remember that any manipulation we do to a free-floating vector, you can do to a variable inside data frame with `mutate()` and friends.
|
||||
Just remember that any manipulation we do to a free-floating vector, you can do to a variable inside a data frame with `mutate()` and friends.
|
||||
|
||||
```{r}
|
||||
df <- tibble(x)
|
||||
|
@ -102,7 +102,7 @@ x == c(1, 2)
|
|||
|
||||
What's going on?
|
||||
Computers store numbers with a fixed number of decimal places so there's no way to exactly represent 1/49 or `sqrt(2)` and subsequent computations will be very slightly off.
|
||||
We can see the exact values by calling `print()` with the the `digits`[^logicals-1] argument:
|
||||
We can see the exact values by calling `print()` with the `digits`[^logicals-1] argument:
|
||||
|
||||
[^logicals-1]: R normally calls print for you (i.e. `x` is a shortcut for `print(x)`), but calling it explicitly is useful if you want to provide other arguments.
|
||||
|
||||
|
@ -148,7 +148,7 @@ x == y
|
|||
# We don't know!
|
||||
```
|
||||
|
||||
So if you want to find all flights with `dep_time` is missing, the following code doesn't work because `dep_time == NA` will yield a `NA` for every single row, and `filter()` automatically drops missing values:
|
||||
So if you want to find all flights where `dep_time` is missing, the following code doesn't work because `dep_time == NA` will yield `NA` for every single row, and `filter()` automatically drops missing values:
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -197,7 +197,7 @@ We'll come back to cover missing values in more depth in @sec-missing-values.
|
|||
## Boolean algebra
|
||||
|
||||
Once you have multiple logical vectors, you can combine them together using Boolean algebra.
|
||||
In R, `&` is "and", `|` is "or", and `!` is "not", and `xor()` is exclusive or[^logicals-2].
|
||||
In R, `&` is "and", `|` is "or", `!` is "not", and `xor()` is exclusive or[^logicals-2].
|
||||
@fig-bool-ops shows the complete set of Boolean operations and how they work.
|
||||
|
||||
[^logicals-2]: That is, `xor(x, y)` is true if x is true, or y is true, but not both.
|
||||
|
@ -225,7 +225,7 @@ knitr::include_graphics("diagrams/transform.png", dpi = 270)
|
|||
As well as `&` and `|`, R also has `&&` and `||`.
|
||||
Don't use them in dplyr functions!
|
||||
These are called short-circuiting operators and only ever return a single `TRUE` or `FALSE`.
|
||||
They're important for programming, not data science
|
||||
They're important for programming, not data science.
|
||||
|
||||
### Missing values {#sec-na-boolean}
|
||||
|
||||
|
@ -249,7 +249,7 @@ Similar reasoning applies with `NA & FALSE`.
|
|||
### Order of operations
|
||||
|
||||
Note that the order of operations doesn't work like English.
|
||||
Take the following code finds all flights that departed in November or December:
|
||||
Take the following code that finds all flights that departed in November or December:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
|
@ -258,7 +258,7 @@ flights |>
|
|||
filter(month == 11 | month == 12)
|
||||
```
|
||||
|
||||
You might be tempted to write it like you'd say in English: "find all flights that departed in November or December":
|
||||
You might be tempted to write it like you'd say in English: "Find all flights that departed in November or December.":
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -267,9 +267,9 @@ flights |>
|
|||
|
||||
This code doesn't error but it also doesn't seem to have worked.
|
||||
What's going on?
|
||||
Here R first evaluates `month == 11` creating a logical vector, which we call `nov`.
|
||||
Here, R first evaluates `month == 11` creating a logical vector, which we call `nov`.
|
||||
It computes `nov | 12`.
|
||||
When you use a number with a logical operator it converts everything apart from 0 to TRUE, so this is equivalent to `nov | TRUE` which will always be `TRUE`, so every row will be selected:
|
||||
When you use a number with a logical operator it converts everything apart from 0 to `TRUE`, so this is equivalent to `nov | TRUE` which will always be `TRUE`, so every row will be selected:
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -317,7 +317,7 @@ flights |>
|
|||
|
||||
1. Find all flights where `arr_delay` is missing but `dep_delay` is not. Find all flights where neither `arr_time` nor `sched_arr_time` are missing, but `arr_delay` is.
|
||||
2. How many flights have a missing `dep_time`? What other variables are missing in these rows? What might these rows represent?
|
||||
3. Assuming that a missing `dep_time` implies that a flight is cancelled, look at the number of cancelled flights per day. Is there a pattern? Is there a connection between the proportion of cancelled flights and average delay of non-cancelled flights?
|
||||
3. Assuming that a missing `dep_time` implies that a flight is cancelled, look at the number of cancelled flights per day. Is there a pattern? Is there a connection between the proportion of cancelled flights and the average delay of non-cancelled flights?
|
||||
|
||||
## Summaries {#sec-logical-summaries}
|
||||
|
||||
|
@ -350,7 +350,7 @@ That leads us to the numeric summaries.
|
|||
|
||||
When you use a logical vector in a numeric context, `TRUE` becomes 1 and `FALSE` becomes 0.
|
||||
This makes `sum()` and `mean()` very useful with logical vectors because `sum(x)` will give the number of `TRUE`s and `mean(x)` the proportion of `TRUE`s.
|
||||
That lets us see the distribution of delays across the days of the year as shown in @fig-prop-delayed-dist.
|
||||
That lets us see the distribution of delays across the days of the year as shown in @fig-prop-delayed-dist
|
||||
|
||||
```{r}
|
||||
#| label: fig-prop-delayed-dist
|
||||
|
@ -369,7 +369,7 @@ flights |>
|
|||
geom_histogram(binwidth = 0.05)
|
||||
```
|
||||
|
||||
Or we could ask how many flights left before 5am, which are often flights that were delayed from the previous day:
|
||||
Or we could ask: "How many flights left before 5am?", which are often flights that were delayed from the previous day:
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -387,7 +387,7 @@ There's one final use for logical vectors in summaries: you can use a logical ve
|
|||
This makes use of the base `[` (pronounced subset) operator, which you'll learn more about in @sec-subset-many.
|
||||
|
||||
Imagine we wanted to look at the average delay just for flights that were actually delayed.
|
||||
One way to do so would be to first filter the flights:
|
||||
One way to do so would be to first filter the flights and then calculate the average delay:
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -424,7 +424,7 @@ Also note the difference in the group size: in the first chunk `n()` gives the n
|
|||
### Exercises
|
||||
|
||||
1. What will `sum(is.na(x))` tell you? How about `mean(is.na(x))`?
|
||||
2. What does `prod()` return when applied to a logical vector? What logical summary function is it equivalent to? What does `min()` return applied to a logical vector? What logical summary function is it equivalent to? Read the documentation and perform a few experiments.
|
||||
2. What does `prod()` return when applied to a logical vector? What logical summary function is it equivalent to? What does `min()` return when applied to a logical vector? What logical summary function is it equivalent to? Read the documentation and perform a few experiments.
|
||||
|
||||
## Conditional transformations
|
||||
|
||||
|
@ -433,7 +433,7 @@ There are two important tools for this: `if_else()` and `case_when()`.
|
|||
|
||||
### `if_else()`
|
||||
|
||||
If you want to use one value when a condition is true and another value when it's `FALSE`, you can use `dplyr::if_else()`[^logicals-4].
|
||||
If you want to use one value when a condition is `TRUE` and another value when it's `FALSE`, you can use `dplyr::if_else()`[^logicals-4].
|
||||
You'll always use the first three argument of `if_else()`. The first argument, `condition`, is a logical vector, the second, `true`, gives the output when the condition is true, and the third, `false`, gives the output if the condition is false.
|
||||
|
||||
[^logicals-4]: dplyr's `if_else()` is very similar to base R's `ifelse()`.
|
||||
|
@ -468,7 +468,7 @@ y1 <- c(3, NA, 4, 6)
|
|||
if_else(is.na(x1), y1, x1)
|
||||
```
|
||||
|
||||
You might have noticed a small infelicity in our labeling: zero is neither positive nor negative.
|
||||
You might have noticed a small infelicity in our labeling example above: zero is neither positive nor negative.
|
||||
We could resolve this by adding an additional `if_else()`:
|
||||
|
||||
```{r}
|
||||
|
@ -576,8 +576,8 @@ We don't expect you to memorize these rules, but they should become second natur
|
|||
The definition of a logical vector is simple because each value must be either `TRUE`, `FALSE`, or `NA`.
|
||||
But logical vectors provide a huge amount of power.
|
||||
In this chapter, you learned how to create logical vectors with `>`, `<`, `<=`, `=>`, `==`, `!=`, and `is.na()`, how to combine them with `!`, `&`, and `|`, and how to summarize them with `any()`, `all()`, `sum()`, and `mean()`.
|
||||
You also learned the powerful `if_else()` and `case_when()` that allow you to return values depending on the value of a logical vector.
|
||||
You also learned the powerful `if_else()` and `case_when()` functions that allow you to return values depending on the value of a logical vector.
|
||||
|
||||
We'll see logical vectors again and in the following chapters.
|
||||
We'll see logical vectors again and again in the following chapters.
|
||||
For example in @sec-strings you'll learn about `str_detect(x, pattern)` which returns a logical vector that's `TRUE` for the elements of `x` that match the `pattern`, and in @sec-dates-and-times you'll create logical vectors from the comparison of dates and times.
|
||||
But for now, we're going to move onto the next most important type of vector: numeric vectors.
|
||||
|
|
Loading…
Reference in New Issue