Joins proofing
This commit is contained in:
parent
5485a91b49
commit
3e167168e7
15
joins.qmd
15
joins.qmd
|
@ -13,24 +13,22 @@ It's rare that a data analysis involves only a single data frame.
|
|||
Typically you have many data frames, and you must **join** them together to answer the questions that you're interested in.
|
||||
This chapter will introduce you to two important types of joins:
|
||||
|
||||
- Mutating joins, add new variables to one data frame from matching observations in another.
|
||||
- Filtering joins, filter observations from one data frame based on whether or not they match an observation in another.
|
||||
- Mutating joins, which add new variables to one data frame from matching observations in another.
|
||||
- Filtering joins, which filter observations from one data frame based on whether or not they match an observation in another.
|
||||
|
||||
We'll begin by discussing keys, the variables used to connect a pair of data frames in a join.
|
||||
You'll then see how to use joins to tackle a variety of challenges from the nycflights13 dataset.
|
||||
We cement the theory with an examination of the keys in the nycflights13 datasets, then use that knowledge to start joining data frames together.
|
||||
Next we'll discuss how joins work, focusing on their action on the rows.
|
||||
We'll finish up with a discussion of non-equi-joins, a family of joins that provide a more flexible way of matching keys than the default equality relationship.
|
||||
|
||||
If you're familiar with SQL, you should find the ideas in this chapter familiar, as their realization in dplyr is very similar.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
::: callout-important
|
||||
This chapter relies on features only found in dplyr 1.1.0, which is still in development.
|
||||
If you want to live life on the edge you can get the dev version with `devtools::install_github("tidyverse/dplyr")`.
|
||||
If you want to live life on the edge, you can get the dev version with `devtools::install_github("tidyverse/dplyr")`.
|
||||
:::
|
||||
|
||||
We'll explore the five related datasets from nycflights13 using the join functions from dplyr.
|
||||
In this chapter, we'll explore the five related datasets from nycflights13 using the join functions from dplyr.
|
||||
|
||||
```{r}
|
||||
#| label: setup
|
||||
|
@ -42,8 +40,7 @@ library(nycflights13)
|
|||
|
||||
## Keys
|
||||
|
||||
To understand joins, you need to first understand how two tables might be connected.
|
||||
The connection between a pair of tables is defined by a pair of keys, which each consist of one or more variables.
|
||||
To understand joins, you need to first understand how two tables can be connected through a pair of keys, with on each table.
|
||||
In this section, you'll learn about the two types of key and their realization in the datasets of the nycflights13 package.
|
||||
You'll also learn how to check that your keys are valid, and what to do if your table lacks a key.
|
||||
|
||||
|
|
Loading…
Reference in New Issue