Merge branch 'master' of github.com:hadley/r4ds

This commit is contained in:
hadley 2016-01-21 09:51:09 -06:00
commit 43fcab68c9
1 changed files with 4 additions and 4 deletions

View File

@ -316,7 +316,7 @@ So far, the pairs of tables have always been joined by a single variable, and th
a suffix. a suffix.
* A named character vector: `by = c("a" = "b")`. This will * A named character vector: `by = c("a" = "b")`. This will
match variable `a` in table `x` to variable `y` in table `b`. The match variable `a` in table `x` to variable `b` in table `y`. The
variables from `x` will be used in the output. variables from `x` will be used in the output.
For example, if we want to draw a map we need to combine the flights data For example, if we want to draw a map we need to combine the flights data
@ -429,7 +429,7 @@ Graphically, a semi-join looks like this:
knitr::include_graphics("diagrams/join-semi.png") knitr::include_graphics("diagrams/join-semi.png")
``` ```
Only the existence of a match is important; it doesn't match what observation is matched. This means that filtering joins never duplicate rows like mutating joins do: Only the existence of a match is important; it doesn't matter which observation is matched. This means that filtering joins never duplicate rows like mutating joins do:
```{r, echo = FALSE, out.width = "50%"} ```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("diagrams/join-semi-many.png") knitr::include_graphics("diagrams/join-semi-many.png")
@ -467,7 +467,7 @@ flights %>%
The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly. The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly.
1. Start by identifying the variables that form the primary key in each table. 1. Start by identifying the variables that form the primary key in each table.
You should usually do this based on your understand of the data, not You should usually do this based on your understanding of the data, not
empirically by looking for a combination of variables that give a empirically by looking for a combination of variables that give a
unique identifier. If you just look for variables without thinking about unique identifier. If you just look for variables without thinking about
what they mean, you might get (un)lucky and find a combination that's what they mean, you might get (un)lucky and find a combination that's
@ -490,7 +490,7 @@ The data you've been working with in this chapter has been cleaned up so that yo
use of inner vs. outer joins, carefully considering whether or not you use of inner vs. outer joins, carefully considering whether or not you
want to drop rows that don't have a match. want to drop rows that don't have a match.
Be aware that simply checking the number of rows before and after the join is not sufficient to ensure that your join has gone smoothly. If you have an inner join with duplicate keys in both tables, you might get unlikely at the number of dropped rows might exactly equal the number of duplicated rows! Be aware that simply checking the number of rows before and after the join is not sufficient to ensure that your join has gone smoothly. If you have an inner join with duplicate keys in both tables, you might get unlucky as the number of dropped rows might exactly equal the number of duplicated rows!
## Set operations {#set-operations} ## Set operations {#set-operations}