Move more from missing values; fix build failure
This commit is contained in:
parent
66675a2600
commit
005969424e
|
@ -12,7 +12,8 @@ You'll find logical vectors directly in data relatively rarely, but despite that
|
|||
|
||||
We'll begin with the most common way of creating logical vectors: numeric comparisons.
|
||||
Then we'll talk about using Boolean algebra to combine different logical vectors, and some useful summaries for logical vectors.
|
||||
We'll finish off with some other tool for making conditional changes
|
||||
We'll finish off with some other tool for making conditional changes.
|
||||
Along the way, you'll also learn a little more about working with missing values, `NA`.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
@ -269,6 +270,10 @@ Similar reasoning applies with `NA & FALSE`.
|
|||
### Exercises
|
||||
|
||||
1. Find all flights where `arr_delay` is missing but `dep_delay` is not. Find all flights where neither `arr_time` nor `sched_arr_time` are missing, but `arr_delay` is.
|
||||
2. How many flights have a missing `dep_time`? What other variables are missing? What might these rows represent?
|
||||
3. How could you use `arrange()` to sort all missing values to the start? (Hint: use `!is.na()`).
|
||||
4. Come up with another approach that will give you the same output as `not_cancelled |> count(dest)` and `not_cancelled |> count(tailnum, wt = distance)` (without using `count()`).
|
||||
5. Look at the number of cancelled flights per day. Is there a pattern? Is the proportion of cancelled flights related to the average delay?
|
||||
|
||||
## Summaries
|
||||
|
||||
|
@ -416,3 +421,4 @@ df |> filter(cumall(!(balance < 0)))
|
|||
###
|
||||
|
||||
##
|
||||
|
||||
|
|
|
@ -18,31 +18,6 @@ Missing topics:
|
|||
|
||||
- `coalesce()` and `na_if()`
|
||||
|
||||
## Basics
|
||||
|
||||
### Missing values {#missing-values-filter}
|
||||
|
||||
If you want to determine if a value is missing, use `is.na()`:
|
||||
|
||||
```{r}
|
||||
is.na(x)
|
||||
```
|
||||
|
||||
### Exercises
|
||||
|
||||
1. How many flights have a missing `dep_time`?
|
||||
What other variables are missing?
|
||||
What might these rows represent?
|
||||
|
||||
2. How could you use `arrange()` to sort all missing values to the start?
|
||||
(Hint: use `!is.na()`).
|
||||
|
||||
3. Come up with another approach that will give you the same output as `not_cancelled |> count(dest)` and `not_cancelled |> count(tailnum, wt = distance)` (without using `count()`).
|
||||
|
||||
4. Look at the number of cancelled flights per day.
|
||||
Is there a pattern?
|
||||
Is the proportion of cancelled flights related to the average delay?
|
||||
|
||||
## Explicit vs implicit missing values {#missing-values-tidy}
|
||||
|
||||
Changing the representation of a dataset brings up an important subtlety of missing values.
|
||||
|
|
Loading…
Reference in New Issue