Proofing of final sections of relational data

This commit is contained in:
hadley 2016-01-13 09:45:17 -06:00
parent 79b0660040
commit 3a09b18632
1 changed files with 4 additions and 4 deletions

View File

@ -365,7 +365,7 @@ Filtering joins match obserations in the same way as mutating joins, but affect
* `semi_join(x, y)` __keeps__ all observations in `x` that have a match in `y`.
* `anti_join(x, y)` __drops__ all observations in `x` that have a match in `y`.
Semi joins are useful for matching filtered summary tables back to the original rows. For example, imagine you've found the top ten most popular destinations:
Semi-joins are useful for matching filtered summary tables back to the original rows. For example, imagine you've found the top ten most popular destinations:
```{r}
top_dest <- flights %>%
@ -382,13 +382,13 @@ flights %>% filter(dest %in% top_dest$dest)
But it's difficult to extend that approach to multiple variables. For example, imagine that you'd found the 10 days with highest average delays. How would you construct the filter statement that used `year`, `month`, and `day` to match it back to `flights`?
Instead you can use a semi join, which connects the two tables like a mutating join, but instead of adding new columns, only keeps the rows in `x` that have a match in `y`:
Instead you can use a semi-join, which connects the two tables like a mutating join, but instead of adding new columns, only keeps the rows in `x` that have a match in `y`:
```{r}
flights %>% semi_join(top_dest)
```
The inverse of a semi join is an anti join. An anti join keeps the rows that _don't_ have a match, and are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`:
The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _don't_ have a match, and are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`:
```{r}
flights %>%
@ -411,7 +411,7 @@ flights %>%
## Set operations {#set-operations}
The final type of two-table verb is set operations. Generally, I use these the least frequnetly, but they are occassionally useful when you want to break a single complex filter into simpler pieces that you then combine.
The final type of two-table verb is set operations. Generally, I use these the least frequently, but they are occassionally useful when you want to break a single complex filter into simpler pieces that you then combine.
All these operations work with a complete row, comparing the values of every variable. These expect the `x` and `y` inputs to have the same variables, and treat the observations like sets: