|
|
|
|
@@ -54,7 +54,7 @@ You can use the nycflights13 package to learn about relational data. nycflights1
|
|
|
|
|
|
|
|
|
|
One way to show the relationships between the different tables is with a drawing:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "75%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/relational-nycflights.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -176,7 +176,7 @@ The following sections explain, in detail, how mutating joins work. You'll start
|
|
|
|
|
|
|
|
|
|
To help you learn how joins work, I'm going to represent data frames visually:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "25%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-setup.png")
|
|
|
|
|
```
|
|
|
|
|
```{r}
|
|
|
|
|
@@ -188,7 +188,7 @@ The coloured column represents the "key" variable: these are used to match the r
|
|
|
|
|
|
|
|
|
|
A join is a way of connecting each row in `x` to zero, one, or more rows in `y`. The following diagram shows each potential match as an intersection of a pair of lines.
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "35%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-setup2.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -196,7 +196,7 @@ knitr::include_graphics("diagrams/join-setup2.png")
|
|
|
|
|
|
|
|
|
|
In an actual join, matches will be indicated with dots. The colour of the dots match the colour of the keys to remind that that's what important. Then the number of dots = the number of matches = the number of rows in the output.
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "70%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-inner.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -204,7 +204,7 @@ knitr::include_graphics("diagrams/join-inner.png")
|
|
|
|
|
|
|
|
|
|
The simplest type of join is the __inner join__. An inner join matches pairs of observations whenever their keys are equal:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "70%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-inner.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -230,7 +230,7 @@ These joins work by adding an additional "virtual" observation to each table. Th
|
|
|
|
|
|
|
|
|
|
Graphically, that looks like:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "75%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-outer.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -252,7 +252,7 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
|
|
|
|
|
add in additional information as there is typically a one-to-many
|
|
|
|
|
relationship.
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "75%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-one-to-many.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -270,7 +270,7 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
|
|
|
|
|
neither table do the keys uniquely identify an observation. When you join
|
|
|
|
|
duplicated keys, you get all possible combinations, the Cartesian product:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "75%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-many-to-many.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
@@ -416,19 +416,19 @@ flights %>% semi_join(top_dest)
|
|
|
|
|
|
|
|
|
|
Graphically, a semi-join looks like this:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "50%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-semi.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Only the existence of a match is important; it doesn't matter which observation is matched. This means that filtering joins never duplicate rows like mutating joins do:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "50%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-semi-many.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _don't_ have a match:
|
|
|
|
|
|
|
|
|
|
```{r, echo = FALSE, out.width = "50%"}
|
|
|
|
|
```{r, echo = FALSE}
|
|
|
|
|
knitr::include_graphics("diagrams/join-anti.png")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|