Mention distinct
This commit is contained in:
parent
bdc3555b9a
commit
1477dd6fd3
|
@ -96,6 +96,7 @@ Let's dive in!
|
|||
|
||||
The most important verbs that operate on rows are `filter()`, which changes which rows are present without changing their order, and `arrange()`, which changes the order of the rows without changing which are present.
|
||||
Both functions only affect the rows, and the columns are left unchanged.
|
||||
We'll also discuss `distinct()` which finds rows with unique values but unlike `arrange()` and `filter()` it can also optionally modify the columns.
|
||||
|
||||
### `filter()`
|
||||
|
||||
|
@ -197,6 +198,23 @@ flights |>
|
|||
arrange(desc(arr_delay))
|
||||
```
|
||||
|
||||
### `distinct()`
|
||||
|
||||
`distinct()` finds all the unique rows in a dataset, so in a technical sense, it primarily operates on the rows.
|
||||
Most of the time, however, you'll want to the distinct combination of some variables, so you can also optionally supply column names:
|
||||
|
||||
```{r}
|
||||
# This would remove any duplicate rows if there were any
|
||||
flights |>
|
||||
distinct()
|
||||
|
||||
# This finds all unique origin and destination pairs.
|
||||
flights |>
|
||||
distinct(origin, dest)
|
||||
```
|
||||
|
||||
Note that if you want to find the number of duplicates, or rows that weren't duplicated, you're better off swapping `distinct()` for `count()` and then filtering as needed.
|
||||
|
||||
### Exercises
|
||||
|
||||
1. Find all flights that
|
||||
|
@ -213,10 +231,12 @@ flights |>
|
|||
|
||||
3. Sort `flights` to find the fastest flights (Hint: try sorting by a calculation).
|
||||
|
||||
4. Which flights traveled the farthest?
|
||||
Which traveled the shortest?
|
||||
4. Was there a flight on every day of 2017?
|
||||
|
||||
5. Does it matter what order you used `filter()` and `arrange()` in if you're using both?
|
||||
5. Which flights traveled the farthest distance?
|
||||
Which traveled the least distance?
|
||||
|
||||
6. Does it matter what order you used `filter()` and `arrange()` in if you're using both?
|
||||
Why/why not?
|
||||
Think about the results and how much work the functions would have to do.
|
||||
|
||||
|
@ -224,6 +244,7 @@ flights |>
|
|||
|
||||
There are four important verbs that affect the columns without changing the rows: `mutate()`, `select()`, `rename()`, and `relocate()`.
|
||||
`mutate()` creates new columns that are functions of the existing columns; `select()`, `rename()`, and `relocate()` change which columns are present, their names, or their positions.
|
||||
We'll also discuss `pull()` since it allows you to get a column out of data frame.
|
||||
|
||||
### `mutate()` {#sec-mutate}
|
||||
|
||||
|
|
Loading…
Reference in New Issue