Proof read alt text
This commit is contained in:
parent
8e97ac0875
commit
fad2d6c940
81
joins.qmd
81
joins.qmd
|
@ -93,17 +93,14 @@ These relationships are summarized visually in @fig-flights-relationships.
|
|||
#| Variables making up a primary key are coloured grey, and are connected
|
||||
#| to their corresponding foreign keys with arrows.
|
||||
#| fig-alt: >
|
||||
#| Diagram showing the relationships between airports, planes, flights,
|
||||
#| weather, and airlines datasets from the nycflights13 package. The faa
|
||||
#| variable in the airports data frame is connected to the origin and dest
|
||||
#| variables in the flights data frame. The tailnum variable in the planes
|
||||
#| data frame is connected to the tailnum variable in flights. The
|
||||
#| time_hour and origin variables in the weather data frame are connected
|
||||
#| to the variables with the same name in the flights data frame. And
|
||||
#| finally the carrier variables in the airlines data frame is connected
|
||||
#| to the carrier variable in the flights data frame. There are no direct
|
||||
#| connections between airports, planes, airlines, and weather data
|
||||
#| frames.
|
||||
#| The relationships between airports, planes, flights, weather, and
|
||||
#| airlines datasets from the nycflights13 package. airports$faa
|
||||
#| connected to the flights$origin and flights$dest. planes$tailnum
|
||||
#| is connected to the flights$tailnum. weather$time_hour and
|
||||
#| weather$origin are jointly connected to flights$time_hour and
|
||||
#| flights$origin. airlines$carrier is connected to flights$carrier.
|
||||
#| There are no direct connections between airports, planes, airlines,
|
||||
#| and weather data frames.
|
||||
knitr::include_graphics("diagrams/relational.png", dpi = 270)
|
||||
```
|
||||
|
||||
|
@ -433,9 +430,9 @@ y <- tribble(
|
|||
#| columns map background colour to key value. The grey columns represent
|
||||
#| the "value" columns that are carried along for the ride.
|
||||
#| fig-alt: >
|
||||
#| x and y are two data frames with 2 columns and 3 rows each. The first
|
||||
#| column in each is the key and the second is the value. The contents of
|
||||
#| these data frames are given in the previous code chunk.
|
||||
#| x and y are two data frames with 2 columns and 3 rows, with contents
|
||||
#| as described in the text. The values of the keys are coloured:
|
||||
#| 1 is green, 2 is purple, 3 is orange, and 4 is yellow.
|
||||
|
||||
knitr::include_graphics("diagrams/join/setup.png", dpi = 270)
|
||||
```
|
||||
|
@ -453,8 +450,8 @@ The rows and columns in the output are primarily determined by `x`, so the `x` t
|
|||
#| fig-alt: >
|
||||
#| x and y are placed at right-angles, with horizonal lines extending
|
||||
#| from x and vertical lines extending from y. There are 3 rows in x and
|
||||
#| 3 rows in y leading to 9 intersections that represent nine potential
|
||||
#| matches.
|
||||
#| 3 rows in y, which leads to nine intersections representing nine
|
||||
#| potential matches.
|
||||
|
||||
knitr::include_graphics("diagrams/join/setup2.png", dpi = 270)
|
||||
```
|
||||
|
@ -473,8 +470,9 @@ We'll come back to non-equi joins in @sec-non-equi-joins.
|
|||
#| An inner join matches each row in `x` to the row in `y` that has the
|
||||
#| same value of `key`. Each match becomes a row in the output.
|
||||
#| fig-alt: >
|
||||
#| Keys 1 and 2 appear in both x and y, so there values are equal and
|
||||
#| we get a match, indicated by a dot. Each dot corresponds to a row
|
||||
#| x and y are placed at right-angles with lines forming a grid of
|
||||
#| potential matches. Keys 1 and 2 appear in both x and y, so we
|
||||
#| get a match, indicated by a dot. Each dot corresponds to a row
|
||||
#| in the output, so the resulting joined data frame has two rows.
|
||||
|
||||
knitr::include_graphics("diagrams/join/inner.png", dpi = 270)
|
||||
|
@ -496,10 +494,11 @@ There are three types of outer joins:
|
|||
#| A visual representation of the left join where every row in `x`
|
||||
#| appears in the output.
|
||||
#| fig-alt: >
|
||||
#| Compared to the inner join, the `y` table gets a new virtual row
|
||||
#| that will match any row in `x` that doesn't otherwise have a match.
|
||||
#| This means that the output now has three rows. For key = 3, which
|
||||
#| matches this virtual row, the value of val_y is NA.
|
||||
#| Compared to the previous diagram showing an inner join, the y table
|
||||
#| gets a new virtual row containin NA that will match any row in x
|
||||
#| that didn't otherwise match. This means that the output now has
|
||||
#| three rows. For key = 3, which matches this virtual row, val_y takes
|
||||
#| value NA.
|
||||
|
||||
knitr::include_graphics("diagrams/join/left.png", dpi = 270)
|
||||
```
|
||||
|
@ -516,12 +515,9 @@ There are three types of outer joins:
|
|||
#| A visual representation of the right join where every row of `y`
|
||||
#| appears in the output.
|
||||
#| fig-alt: >
|
||||
#| Keys 1 and 2 from x are matched to those in y, key 4 is
|
||||
#| also carried along to the joined result since it's on the right data
|
||||
#| frame, but key 3 from x is not carried along since it's on the left
|
||||
#| but not on the right. The result is a data frame with 3 rows: keys
|
||||
#| 1, 2, and 4, all values from val_y, and the corresponding values
|
||||
#| from val_x for keys 1 and 2 with an NA for key 4, val_x.
|
||||
#| Compared to the previous diagram showing an left join, the x table
|
||||
#| now gains a virtual row so that every row in y gets a match in x.
|
||||
#| val_x contains NA for the row in y that didn't match x.
|
||||
|
||||
knitr::include_graphics("diagrams/join/right.png", dpi = 270)
|
||||
```
|
||||
|
@ -538,6 +534,7 @@ There are three types of outer joins:
|
|||
#| A visual representation of the full join where every row in `x`
|
||||
#| and `y` appears in the output.
|
||||
#| fig-alt: >
|
||||
#| Now both x and y have a virtual row that always matches.
|
||||
#| The result has 4 rows: keys 1, 2, 3, and 4 with all values
|
||||
#| from val_x and val_y, however key 2, val_y and key 4, val_x are NAs
|
||||
#| since those keys don't have a match in the other data frames.
|
||||
|
@ -561,10 +558,10 @@ However, this is not a great representation because while it might jog your memo
|
|||
#| and y, with x on the right and y on the left. Shading indicates the
|
||||
#| result of the join.
|
||||
#|
|
||||
#| Inner join: Only intersection is shaded.
|
||||
#| Inner join: the intersection is shaded.
|
||||
#| Full join: Everything is shaded.
|
||||
#| Left join: All of x is shaded.
|
||||
#| Right: All of y is shaded.
|
||||
#| Right join: All of y is shaded.
|
||||
|
||||
knitr::include_graphics("diagrams/join/venn.png", dpi = 270)
|
||||
```
|
||||
|
@ -690,11 +687,9 @@ This means that filtering joins never duplicate rows like mutating joins do.
|
|||
#| In a semi-join it only matters that there is a match; otherwise
|
||||
#| values in `y` don't affect the output.
|
||||
#| fig-alt: >
|
||||
#| Diagram of a semi join. Data frame x is on the left and has two columns
|
||||
#| (key and val_x) with keys 1, 2, and 3. Diagram y is on the right and also
|
||||
#| has two columns (key and val_y) with keys 1, 2, and 4. Semi joining these
|
||||
#| two results in a data frame with two rows and two columns (key and val_x),
|
||||
#| with keys 1 and 2 (the only keys that match between the two data frames).
|
||||
#| A join diagram with old friends x and y. In a semi join, only the
|
||||
#| presence of a match matters so the output contains the same columns
|
||||
#| as x.
|
||||
|
||||
knitr::include_graphics("diagrams/join/semi.png", dpi = 270)
|
||||
```
|
||||
|
@ -707,11 +702,8 @@ knitr::include_graphics("diagrams/join/semi.png", dpi = 270)
|
|||
#| An anti-join is the inverse of a semi-join, dropping rows from `x`
|
||||
#| that have a match in `y`.
|
||||
#| fig-alt: >
|
||||
#| Diagram of an anti join. Data frame x is on the left and has two columns
|
||||
#| (key and val_x) with keys 1, 2, and 3. Diagram y is on the right and also
|
||||
#| has two columns (key and val_y) with keys 1, 2, and 4. Anti joining these
|
||||
#| two results in a data frame with one row and two columns (key and val_x),
|
||||
#| with keys 3 only (the only key in x that is not in y).
|
||||
#| An anti-join is the inverse of a semi-join so matches are drawn with
|
||||
#| red lines indicating that they will be dropped from the output.
|
||||
|
||||
knitr::include_graphics("diagrams/join/anti.png", dpi = 270)
|
||||
```
|
||||
|
@ -737,7 +729,7 @@ x |> left_join(y, by = "key", keep = TRUE)
|
|||
#| A join diagram showing an inner join betwen x and y. The result
|
||||
#| now includes four columns: key.x, val_x, key.y, and val_y. The
|
||||
#| values of key.x and key.y are identical, which is why we usually
|
||||
#| omit one.
|
||||
#| only show one.
|
||||
#| echo: false
|
||||
#| out-width: ~
|
||||
|
||||
|
@ -807,7 +799,8 @@ Inequality joins use `<`, `<=`, `>=`, or `>` to restrict the set of possible mat
|
|||
#| out-width: ~
|
||||
#| fig-cap: >
|
||||
#| An inequality join where `x` is joined to `y` on rows where the key
|
||||
#| of `x` is less than the key of `y`.
|
||||
#| of `x` is less than the key of `y`. This makes a triangular
|
||||
#| shape in the top-left corner.
|
||||
knitr::include_graphics("diagrams/join/lt.png", dpi = 270)
|
||||
```
|
||||
|
||||
|
@ -833,6 +826,10 @@ For example `join_by(closest(x <= y))` matches the smallest `y` that's greater t
|
|||
#| fig-cap: >
|
||||
#| A following join is similar to a greater-than-or-equal inequality join
|
||||
#| but only matches the first value.
|
||||
#| fig-alt: >
|
||||
#| A rolling join is a subset of an inequality join so some matches are
|
||||
#| grayed out indicating that they're not used because they're not the
|
||||
#| "closest".
|
||||
knitr::include_graphics("diagrams/join/closest.png", dpi = 270)
|
||||
```
|
||||
|
||||
|
|
Loading…
Reference in New Issue