equi join and non-equi join
This commit is contained in:
parent
daaf3ef52e
commit
386c9156b0
20
joins.qmd
20
joins.qmd
|
@ -19,7 +19,7 @@ This chapter will introduce you to two important types of joins:
|
|||
We'll begin by discussing keys, the variables used to connect a pair of data frames in a join.
|
||||
We cement the theory with an examination of the keys in the datasets from the nycflights13 package, then use that knowledge to start joining data frames together.
|
||||
Next we'll discuss how joins work, focusing on their action on the rows.
|
||||
We'll finish up with a discussion of non-equi-joins, a family of joins that provide a more flexible way of matching keys than the default equality relationship.
|
||||
We'll finish up with a discussion of non-equi joins, a family of joins that provide a more flexible way of matching keys than the default equality relationship.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
@ -283,8 +283,8 @@ You can override the default suffixes with the `suffix` argument.
|
|||
`join_by(tailnum)` is short for `join_by(tailnum == tailnum)`.
|
||||
It's important to know about this fuller form for two reasons.
|
||||
Firstly, it describes the relationship between the two tables: the keys must be equal.
|
||||
That's why this type of join is often called an **equi-join**.
|
||||
You'll learn about non-equi-joins in @sec-non-equi-joins.
|
||||
That's why this type of join is often called an **equi join**.
|
||||
You'll learn about non-equi joins in @sec-non-equi-joins.
|
||||
|
||||
Secondly, it's how you specify different join keys in each table.
|
||||
For example, there are two ways to join the `flight2` and `airports` table: either by `dest` or `origin`:
|
||||
|
@ -575,7 +575,7 @@ knitr::include_graphics("diagrams/join/venn.png", dpi = 270)
|
|||
```
|
||||
|
||||
The joins shown here are the so-called **equi** **joins**, where rows match if the keys are equal.
|
||||
Equi-joins are the most common type of join, so we'll typically omit the equi prefix, and just say "inner join" rather than "equi inner join".
|
||||
Equi joins are the most common type of join, so we'll typically omit the equi prefix, and just say "inner join" rather than "equi inner join".
|
||||
We'll come back to non-equi joins in @sec-non-equi-joins.
|
||||
|
||||
### Row matching
|
||||
|
@ -666,11 +666,11 @@ knitr::include_graphics("diagrams/join/anti.png", dpi = 270)
|
|||
|
||||
## Non-equi joins {#sec-non-equi-joins}
|
||||
|
||||
So far you've only seen equi-joins, joins where the rows match if the `x` key equals the `y` key.
|
||||
So far you've only seen equi joins, joins where the rows match if the `x` key equals the `y` key.
|
||||
Now we're going to relax that restriction and discuss other ways of determining if a pair of rows match.
|
||||
|
||||
But before we can do that, we need to revisit a simplification we made above.
|
||||
In equi-joins the `x` keys and `y` are always equal, so we only need to show one in the output.
|
||||
In equi joins the `x` keys and `y` are always equal, so we only need to show one in the output.
|
||||
We can request that dplyr keep both keys with `keep = TRUE`, leading to the code below and the re-drawn `inner_join()` in @fig-inner-both.
|
||||
|
||||
```{r}
|
||||
|
@ -692,7 +692,7 @@ x |> left_join(y, by = "key", keep = TRUE)
|
|||
knitr::include_graphics("diagrams/join/inner-both.png", dpi = 270)
|
||||
```
|
||||
|
||||
When we move away from equi-joins we'll always show the keys, because the key values will often be different.
|
||||
When we move away from equi joins we'll always show the keys, because the key values will often be different.
|
||||
For example, instead of matching only when the `x$key` and `y$key` are equal, we could match whenever the `x$key` is greater than or equal to the `y$key`, leading to @fig-join-gte.
|
||||
dplyr's join functions understand this distinction equi and non-equi joins so will always show both keys when you perform a non-equi join.
|
||||
|
||||
|
@ -711,7 +711,7 @@ dplyr's join functions understand this distinction equi and non-equi joins so wi
|
|||
knitr::include_graphics("diagrams/join/gte.png", dpi = 270)
|
||||
```
|
||||
|
||||
Non-equi-join isn't a particularly useful term because it only tells you what the join is not, not what it is. dplyr helps by identifying four particularly useful types of non-equi-join:
|
||||
Non-equi join isn't a particularly useful term because it only tells you what the join is not, not what it is. dplyr helps by identifying four particularly useful types of non-equi join:
|
||||
|
||||
- **Cross joins** match every pair of rows.
|
||||
- **Inequality joins** use `<`, `<=`, `>`, and `>=` instead of `==`.
|
||||
|
@ -883,7 +883,7 @@ employees |>
|
|||
|
||||
### Exercises
|
||||
|
||||
1. Can you explain what's happening with the keys in this equi-join?
|
||||
1. Can you explain what's happening with the keys in this equi join?
|
||||
Why are they different?
|
||||
|
||||
```{r}
|
||||
|
@ -901,7 +901,7 @@ employees |>
|
|||
In this chapter, you've learned how to use mutating and filtering joins to combine data from a pair of data frames.
|
||||
Along the way you learned how to identify keys, and the difference between primary and foreign keys.
|
||||
You also understand how joins work and how to figure out how many rows the output will have.
|
||||
Finally, you've gained a glimpse into the power of non-equi-joins and seen a few interesting use cases.
|
||||
Finally, you've gained a glimpse into the power of non-equi joins and seen a few interesting use cases.
|
||||
|
||||
This chapter concludes the "Transform" part of the book where the focus was on the tools you could use with individual columns and tibbles.
|
||||
You learned about dplyr and base functions for working with logical vectors, numbers, and complete tables, stringr functions for working strings, lubridate functions for working with date-times, and forcats functions for working with factors.
|
||||
|
|
Loading…
Reference in New Issue