Minor corrections (#1244)
This commit is contained in:
parent
01b8566680
commit
5d912aaed8
|
@ -35,7 +35,7 @@ library(nycflights13)
|
|||
|
||||
## Keys
|
||||
|
||||
To understand joins, you need to first understand how two tables can be connected through a pair of keys, with on each table.
|
||||
To understand joins, you need to first understand how two tables can be connected through a pair of keys, within each table.
|
||||
In this section, you'll learn about the two types of key and see examples of both in the datasets of the nycflights13 package.
|
||||
You'll also learn how to check that your keys are valid, and what to do if your table lacks a key.
|
||||
|
||||
|
@ -138,7 +138,7 @@ weather |>
|
|||
### Surrogate keys
|
||||
|
||||
So far we haven't talked about the primary key for `flights`.
|
||||
It's not super important here, because there are no data frames that use it as a foreign key, but it's still useful to consider because it's easier to work with observations if have some way to describe them to others.
|
||||
It's not super important here, because there are no data frames that use it as a foreign key, but it's still useful to consider because it's easier to work with observations if we have some way to describe them to others.
|
||||
|
||||
After a little thinking and experimentation, we determined that there are three variables that together uniquely identify each flight:
|
||||
|
||||
|
@ -194,7 +194,7 @@ Surrogate keys can be particular useful when communicating to other humans: it's
|
|||
## Basic joins {#sec-mutating-joins}
|
||||
|
||||
Now that you understand how data frames are connected via keys, we can start using joins to better understand the `flights` dataset.
|
||||
dplyr provides six join functions: `left_join()`, `inner_join()`, `right_join()`, `semi_join()`, and `anti_join()`.
|
||||
dplyr provides six join functions: `left_join()`, `inner_join()`, `right_join()`, `semi_join()`, `anti_join(), and full_join()`.
|
||||
They all have the same interface: they take a pair of data frames (`x` and `y`) and return a data frame.
|
||||
The order of the rows and columns in the output is primarily determined by `x`.
|
||||
|
||||
|
@ -321,7 +321,7 @@ airports |>
|
|||
**Anti-joins** are the opposite: they return all rows in `x` that don't have a match in `y`.
|
||||
They're useful for finding missing values that are **implicit** in the data, the topic of @sec-missing-implicit.
|
||||
Implicitly missing values don't show up as `NA`s but instead only exist as an absence.
|
||||
For example, we can find rows that as missing from `airports` by looking for flights that don't have a matching destination airport:
|
||||
For example, we can find rows that are missing from `airports` by looking for flights that don't have a matching destination airport:
|
||||
|
||||
```{r}
|
||||
flights2 |>
|
||||
|
|
|
@ -404,7 +404,7 @@ read_excel("data/bake-sale.xlsx")
|
|||
|
||||
### Formatted output
|
||||
|
||||
The readxl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
|
||||
The writexl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
|
||||
Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar.
|
||||
For example, function names are camelCase, multiple functions can't be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse.
|
||||
However, this is ok.
|
||||
|
|
Loading…
Reference in New Issue