Merge branch 'master' of github.com:hadley/r4ds
This commit is contained in:
commit
43fcab68c9
|
@ -316,7 +316,7 @@ So far, the pairs of tables have always been joined by a single variable, and th
|
|||
a suffix.
|
||||
|
||||
* A named character vector: `by = c("a" = "b")`. This will
|
||||
match variable `a` in table `x` to variable `y` in table `b`. The
|
||||
match variable `a` in table `x` to variable `b` in table `y`. The
|
||||
variables from `x` will be used in the output.
|
||||
|
||||
For example, if we want to draw a map we need to combine the flights data
|
||||
|
@ -429,7 +429,7 @@ Graphically, a semi-join looks like this:
|
|||
knitr::include_graphics("diagrams/join-semi.png")
|
||||
```
|
||||
|
||||
Only the existence of a match is important; it doesn't match what observation is matched. This means that filtering joins never duplicate rows like mutating joins do:
|
||||
Only the existence of a match is important; it doesn't matter which observation is matched. This means that filtering joins never duplicate rows like mutating joins do:
|
||||
|
||||
```{r, echo = FALSE, out.width = "50%"}
|
||||
knitr::include_graphics("diagrams/join-semi-many.png")
|
||||
|
@ -467,7 +467,7 @@ flights %>%
|
|||
The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly.
|
||||
|
||||
1. Start by identifying the variables that form the primary key in each table.
|
||||
You should usually do this based on your understand of the data, not
|
||||
You should usually do this based on your understanding of the data, not
|
||||
empirically by looking for a combination of variables that give a
|
||||
unique identifier. If you just look for variables without thinking about
|
||||
what they mean, you might get (un)lucky and find a combination that's
|
||||
|
@ -490,7 +490,7 @@ The data you've been working with in this chapter has been cleaned up so that yo
|
|||
use of inner vs. outer joins, carefully considering whether or not you
|
||||
want to drop rows that don't have a match.
|
||||
|
||||
Be aware that simply checking the number of rows before and after the join is not sufficient to ensure that your join has gone smoothly. If you have an inner join with duplicate keys in both tables, you might get unlikely at the number of dropped rows might exactly equal the number of duplicated rows!
|
||||
Be aware that simply checking the number of rows before and after the join is not sufficient to ensure that your join has gone smoothly. If you have an inner join with duplicate keys in both tables, you might get unlucky as the number of dropped rows might exactly equal the number of duplicated rows!
|
||||
|
||||
## Set operations {#set-operations}
|
||||
|
||||
|
|
Loading…
Reference in New Issue