Fix/data-transform (#1398)

* fix wrong references, inconsistency between sentence and code, and typos

* Update data-transform.qmd

* Update data-transform.qmd

* Update data-transform.qmd

* Update logicals.qmd

---------

Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
Mitsuo Shiota 2023-04-10 12:21:13 +09:00 committed by GitHub
parent b9f4ad61c3
commit e5a847f7b3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 6 additions and 6 deletions

View File

@ -58,7 +58,7 @@ glimpse(flights)
```
In both views, the variables names are followed by abbreviations that tell you the type of each variable: `<int>` is short for integer, `<dbl>` is short for double (aka real numbers), `<chr>` for character (aka strings), and `<dttm>` for date-time.
These are important because the operations you can perform on a column depend so much on its "type", and these types are used to organize the chapters in the next section of the book.
These are important because the operations you can perform on a column depend so much on its "type".
### dplyr basics
@ -102,7 +102,7 @@ We'll also discuss `distinct()` which finds rows with unique values but unlike `
`filter()` allows you to keep rows based on the values of the columns[^data-transform-1].
The first argument is the data frame.
The second and subsequent arguments are the conditions that must be true to keep the row.
For example, we could find all flights that arrived more than 120 minutes (two hours) late:
For example, we could find all flights that departed more than 120 minutes (two hours) late:
[^data-transform-1]: Later, you'll learn about the `slice_*()` family which allows you to choose rows based on their positions.
@ -225,7 +225,7 @@ flights |>
### Exercises
1. In a single pipeline, find all flights that meet all of the following conditions:
1. In a single pipeline, find all flights that meet each of the following conditions:
- Had an arrival delay of two or more hours
- Flew to Houston (`IAH` or `HOU`)
@ -251,7 +251,7 @@ flights |>
## Columns
There are four important verbs that affect the columns without changing the rows: `mutate()` creates new columns that are derived from the existing columns, `select()` changes which columns are present; `rename()` changes the names of the columns; and `relocate()` changes the positions of the columns.
There are four important verbs that affect the columns without changing the rows: `mutate()` creates new columns that are derived from the existing columns, `select()` changes which columns are present, `rename()` changes the names of the columns, and `relocate()` changes the positions of the columns.
### `mutate()` {#sec-mutate}
@ -479,7 +479,7 @@ flights |>
arrange(desc(speed))
```
Even though this pipeline has four steps, it's easy to skim because the verbs come at the start of each line: start with the `flights` data, then filter, then group, then summarize.
Even though this pipeline has four steps, it's easy to skim because the verbs come at the start of each line: start with the `flights` data, then filter, then mutate, then select, then arrange.
What would happen if we didn't have the pipe?
We could nest each function call inside the previous call:
@ -575,7 +575,7 @@ This means subsequent operations will now work "by month".
### `summarize()` {#sec-summarize}
The most important grouped operation is a summary, which, if being used to calculate a single summary statistic, reduces the data frame to have a single row for each group.
In dplyr, this is operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:
In dplyr, this operation is performed by `summarize()`[^data-transform-3], as shown by the following example, which computes the average departure delay by month:
[^data-transform-3]: Or `summarise()`, if you prefer British English.