More work on O'Reilly book
* Make width narrower * Convert deps to table * Strip chapter status
This commit is contained in:
		@@ -1,13 +1,5 @@
 | 
			
		||||
<section data-type="chapter" id="chp-joins">
 | 
			
		||||
<h1><span id="sec-joins" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Joins</span></span></h1><div data-type="note"><div class="callout-body d-flex">
 | 
			
		||||
<div class="callout-icon-container">
 | 
			
		||||
<i class="callout-icon"/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
 | 
			
		||||
 | 
			
		||||
<h1><span id="sec-joins" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Joins</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
 | 
			
		||||
<section id="introduction" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Introduction</h1>
 | 
			
		||||
@@ -57,14 +49,14 @@ Primary and foreign keys</h2>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">airports
 | 
			
		||||
#> # A tibble: 1,458 × 8
 | 
			
		||||
#>   faa   name                             lat   lon   alt    tz dst   tzone      
 | 
			
		||||
#>   <chr> <chr>                          <dbl> <dbl> <dbl> <dbl> <chr> <chr>      
 | 
			
		||||
#> 1 04G   Lansdowne Airport               41.1 -80.6  1044    -5 A     America/Ne…
 | 
			
		||||
#> 2 06A   Moton Field Municipal Airport   32.5 -85.7   264    -6 A     America/Ch…
 | 
			
		||||
#> 3 06C   Schaumburg Regional             42.0 -88.1   801    -6 A     America/Ch…
 | 
			
		||||
#> 4 06N   Randall Airport                 41.4 -74.4   523    -5 A     America/Ne…
 | 
			
		||||
#> 5 09J   Jekyll Island Airport           31.1 -81.4    11    -5 A     America/Ne…
 | 
			
		||||
#> 6 0A9   Elizabethton Municipal Airport  36.4 -82.2  1593    -5 A     America/Ne…
 | 
			
		||||
#>   faa   name                             lat   lon   alt    tz dst   tzone   
 | 
			
		||||
#>   <chr> <chr>                          <dbl> <dbl> <dbl> <dbl> <chr> <chr>   
 | 
			
		||||
#> 1 04G   Lansdowne Airport               41.1 -80.6  1044    -5 A     America…
 | 
			
		||||
#> 2 06A   Moton Field Municipal Airport   32.5 -85.7   264    -6 A     America…
 | 
			
		||||
#> 3 06C   Schaumburg Regional             42.0 -88.1   801    -6 A     America…
 | 
			
		||||
#> 4 06N   Randall Airport                 41.4 -74.4   523    -5 A     America…
 | 
			
		||||
#> 5 09J   Jekyll Island Airport           31.1 -81.4    11    -5 A     America…
 | 
			
		||||
#> 6 0A9   Elizabethton Municipal Airport  36.4 -82.2  1593    -5 A     America…
 | 
			
		||||
#> # … with 1,452 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</li>
 | 
			
		||||
@@ -73,14 +65,14 @@ Primary and foreign keys</h2>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">planes
 | 
			
		||||
#> # A tibble: 3,322 × 9
 | 
			
		||||
#>   tailnum  year type                    manuf…¹ model engines seats speed engine
 | 
			
		||||
#>   <chr>   <int> <chr>                   <chr>   <chr>   <int> <int> <int> <chr> 
 | 
			
		||||
#> 1 N10156   2004 Fixed wing multi engine EMBRAER EMB-…       2    55    NA Turbo…
 | 
			
		||||
#> 2 N102UW   1998 Fixed wing multi engine AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> 3 N103US   1999 Fixed wing multi engine AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> 4 N104UW   1999 Fixed wing multi engine AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> 5 N10575   2002 Fixed wing multi engine EMBRAER EMB-…       2    55    NA Turbo…
 | 
			
		||||
#> 6 N105UW   1999 Fixed wing multi engine AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#>   tailnum  year type                 manuf…¹ model engines seats speed engine
 | 
			
		||||
#>   <chr>   <int> <chr>                <chr>   <chr>   <int> <int> <int> <chr> 
 | 
			
		||||
#> 1 N10156   2004 Fixed wing multi en… EMBRAER EMB-…       2    55    NA Turbo…
 | 
			
		||||
#> 2 N102UW   1998 Fixed wing multi en… AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> 3 N103US   1999 Fixed wing multi en… AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> 4 N104UW   1999 Fixed wing multi en… AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> 5 N10575   2002 Fixed wing multi en… EMBRAER EMB-…       2    55    NA Turbo…
 | 
			
		||||
#> 6 N105UW   1999 Fixed wing multi en… AIRBUS… A320…       2   182    NA Turbo…
 | 
			
		||||
#> # … with 3,316 more rows, and abbreviated variable name ¹manufacturer</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</li>
 | 
			
		||||
@@ -89,16 +81,17 @@ Primary and foreign keys</h2>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">weather
 | 
			
		||||
#> # A tibble: 26,115 × 15
 | 
			
		||||
#>   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
 | 
			
		||||
#>   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
 | 
			
		||||
#> 1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4         NA
 | 
			
		||||
#> 2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06        NA
 | 
			
		||||
#> 3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5         NA
 | 
			
		||||
#> 4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7         NA
 | 
			
		||||
#> 5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7         NA
 | 
			
		||||
#> 6 EWR     2013     1     1     6  37.9  28.0  67.2      240      11.5         NA
 | 
			
		||||
#> # … with 26,109 more rows, and 4 more variables: precip <dbl>, pressure <dbl>,
 | 
			
		||||
#> #   visib <dbl>, time_hour <dttm></pre>
 | 
			
		||||
#>   origin  year month   day  hour  temp  dewp humid wind_dir wind_sp…¹ wind_…²
 | 
			
		||||
#>   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>     <dbl>   <dbl>
 | 
			
		||||
#> 1 EWR     2013     1     1     1  39.0  26.1  59.4      270     10.4       NA
 | 
			
		||||
#> 2 EWR     2013     1     1     2  39.0  27.0  61.6      250      8.06      NA
 | 
			
		||||
#> 3 EWR     2013     1     1     3  39.0  28.0  64.4      240     11.5       NA
 | 
			
		||||
#> 4 EWR     2013     1     1     4  39.9  28.0  62.2      250     12.7       NA
 | 
			
		||||
#> 5 EWR     2013     1     1     5  39.0  28.0  64.4      260     12.7       NA
 | 
			
		||||
#> 6 EWR     2013     1     1     6  37.9  28.0  67.2      240     11.5       NA
 | 
			
		||||
#> # … with 26,109 more rows, 4 more variables: precip <dbl>, pressure <dbl>,
 | 
			
		||||
#> #   visib <dbl>, time_hour <dttm>, and abbreviated variable names
 | 
			
		||||
#> #   ¹wind_speed, ²wind_gust</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</li>
 | 
			
		||||
</ul><p>A <strong>foreign key</strong> is a variable (or set of variables) that corresponds to a primary key in another table. For example:</p>
 | 
			
		||||
@@ -147,8 +140,8 @@ weather |>
 | 
			
		||||
  filter(is.na(tailnum))
 | 
			
		||||
#> # A tibble: 0 × 9
 | 
			
		||||
#> # … with 9 variables: tailnum <chr>, year <int>, type <chr>,
 | 
			
		||||
#> #   manufacturer <chr>, model <chr>, engines <int>, seats <int>, speed <int>,
 | 
			
		||||
#> #   engine <chr>
 | 
			
		||||
#> #   manufacturer <chr>, model <chr>, engines <int>, seats <int>,
 | 
			
		||||
#> #   speed <int>, engine <chr>
 | 
			
		||||
 | 
			
		||||
weather |> 
 | 
			
		||||
  filter(is.na(time_hour) | is.na(origin))
 | 
			
		||||
@@ -189,18 +182,19 @@ Surrogate keys</h2>
 | 
			
		||||
  mutate(id = row_number(), .before = 1)
 | 
			
		||||
flights2
 | 
			
		||||
#> # A tibble: 336,776 × 20
 | 
			
		||||
#>      id  year month   day dep_time sched_dep_t…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵
 | 
			
		||||
#>   <int> <int> <int> <int>    <int>         <int>   <dbl>   <int>   <int>   <dbl>
 | 
			
		||||
#> 1     1  2013     1     1      517           515       2     830     819      11
 | 
			
		||||
#> 2     2  2013     1     1      533           529       4     850     830      20
 | 
			
		||||
#> 3     3  2013     1     1      542           540       2     923     850      33
 | 
			
		||||
#> 4     4  2013     1     1      544           545      -1    1004    1022     -18
 | 
			
		||||
#> 5     5  2013     1     1      554           600      -6     812     837     -25
 | 
			
		||||
#> 6     6  2013     1     1      554           558      -4     740     728      12
 | 
			
		||||
#>      id  year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵
 | 
			
		||||
#>   <int> <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl>
 | 
			
		||||
#> 1     1  2013     1     1      517        515       2     830     819      11
 | 
			
		||||
#> 2     2  2013     1     1      533        529       4     850     830      20
 | 
			
		||||
#> 3     3  2013     1     1      542        540       2     923     850      33
 | 
			
		||||
#> 4     4  2013     1     1      544        545      -1    1004    1022     -18
 | 
			
		||||
#> 5     5  2013     1     1      554        600      -6     812     837     -25
 | 
			
		||||
#> 6     6  2013     1     1      554        558      -4     740     728      12
 | 
			
		||||
#> # … with 336,770 more rows, 10 more variables: carrier <chr>, flight <int>,
 | 
			
		||||
#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
 | 
			
		||||
#> #   hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated variable names
 | 
			
		||||
#> #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay</pre>
 | 
			
		||||
#> #   hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated variable
 | 
			
		||||
#> #   names ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time,
 | 
			
		||||
#> #   ⁵arr_delay</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Surrogate keys can be particular useful when communicating to other humans: it’s much easier to tell someone to take a look at flight 2001 than to say look at UA430 which departed 9am 2013-01-03.</p>
 | 
			
		||||
</section>
 | 
			
		||||
@@ -247,14 +241,14 @@ flights2
 | 
			
		||||
  left_join(airlines)
 | 
			
		||||
#> Joining with `by = join_by(carrier)`
 | 
			
		||||
#> # A tibble: 336,776 × 7
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier name                  
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr>                 
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      United Air Lines Inc. 
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      United Air Lines Inc. 
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      American Airlines Inc.
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      JetBlue Airways       
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      Delta Air Lines Inc.  
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      United Air Lines Inc. 
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier name                
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr>               
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      United Air Lines In…
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      United Air Lines In…
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      American Airlines I…
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      JetBlue Airways     
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      Delta Air Lines Inc.
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      United Air Lines In…
 | 
			
		||||
#> # … with 336,770 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Or we could find out the temperature and wind speed when each plane departed:</p>
 | 
			
		||||
@@ -279,14 +273,14 @@ flights2
 | 
			
		||||
  left_join(planes |> select(tailnum, type, engines, seats))
 | 
			
		||||
#> Joining with `by = join_by(tailnum)`
 | 
			
		||||
#> # A tibble: 336,776 × 9
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier type      engines seats
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr>       <int> <int>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      Fixed wi…       2   149
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      Fixed wi…       2   149
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      Fixed wi…       2   178
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      Fixed wi…       2   200
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      Fixed wi…       2   178
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      Fixed wi…       2   191
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier type   engines seats
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr>    <int> <int>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      Fixed…       2   149
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      Fixed…       2   149
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      Fixed…       2   178
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      Fixed…       2   200
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      Fixed…       2   178
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      Fixed…       2   191
 | 
			
		||||
#> # … with 336,770 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>When <code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">left_join()</a></code> fails to find a match for a row in <code>x</code>, it fills in the new variables with missing values. For example, there’s no information about the plane with tail number <code>N3ALAA</code> so the <code>type</code>, <code>engines</code>, and <code>seats</code> will be missing:</p>
 | 
			
		||||
@@ -318,14 +312,14 @@ Specifying join keys</h2>
 | 
			
		||||
  left_join(planes)
 | 
			
		||||
#> Joining with `by = join_by(year, tailnum)`
 | 
			
		||||
#> # A tibble: 336,776 × 13
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier type  manufactu…¹ model
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr> <chr>       <chr>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      <NA>  <NA>        <NA> 
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      <NA>  <NA>        <NA> 
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      <NA>  <NA>        <NA> 
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      <NA>  <NA>        <NA> 
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      <NA>  <NA>        <NA> 
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      <NA>  <NA>        <NA> 
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier type  manufa…¹ model
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr> <chr>    <chr>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      <NA>  <NA>     <NA> 
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      <NA>  <NA>     <NA> 
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      <NA>  <NA>     <NA> 
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      <NA>  <NA>     <NA> 
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      <NA>  <NA>     <NA> 
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      <NA>  <NA>     <NA> 
 | 
			
		||||
#> # … with 336,770 more rows, 4 more variables: engines <int>, seats <int>,
 | 
			
		||||
#> #   speed <int>, engine <chr>, and abbreviated variable name ¹manufacturer</pre>
 | 
			
		||||
</div>
 | 
			
		||||
@@ -334,17 +328,16 @@ Specifying join keys</h2>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">flights2 |> 
 | 
			
		||||
  left_join(planes, join_by(tailnum))
 | 
			
		||||
#> # A tibble: 336,776 × 14
 | 
			
		||||
#>   year.x time_hour           origin dest  tailnum carrier year.y type    manuf…¹
 | 
			
		||||
#>    <int> <dttm>              <chr>  <chr> <chr>   <chr>    <int> <chr>   <chr>  
 | 
			
		||||
#> 1   2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA        1999 Fixed … BOEING 
 | 
			
		||||
#> 2   2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA        1998 Fixed … BOEING 
 | 
			
		||||
#> 3   2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA        1990 Fixed … BOEING 
 | 
			
		||||
#> 4   2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6        2012 Fixed … AIRBUS 
 | 
			
		||||
#> 5   2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL        1991 Fixed … BOEING 
 | 
			
		||||
#> 6   2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA        2012 Fixed … BOEING 
 | 
			
		||||
#> # … with 336,770 more rows, 5 more variables: model <chr>, engines <int>,
 | 
			
		||||
#> #   seats <int>, speed <int>, engine <chr>, and abbreviated variable name
 | 
			
		||||
#> #   ¹manufacturer</pre>
 | 
			
		||||
#>   year.x time_hour           origin dest  tailnum carrier year.y type        
 | 
			
		||||
#>    <int> <dttm>              <chr>  <chr> <chr>   <chr>    <int> <chr>       
 | 
			
		||||
#> 1   2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA        1999 Fixed wing …
 | 
			
		||||
#> 2   2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA        1998 Fixed wing …
 | 
			
		||||
#> 3   2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA        1990 Fixed wing …
 | 
			
		||||
#> 4   2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6        2012 Fixed wing …
 | 
			
		||||
#> 5   2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL        1991 Fixed wing …
 | 
			
		||||
#> 6   2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA        2012 Fixed wing …
 | 
			
		||||
#> # … with 336,770 more rows, and 6 more variables: manufacturer <chr>,
 | 
			
		||||
#> #   model <chr>, engines <int>, seats <int>, speed <int>, engine <chr></pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Note that the <code>year</code> variables are disambiguated in the output with a suffix (<code>year.x</code> and <code>year.y</code>), which tells you whether the variable came from the <code>x</code> or <code>y</code> argument. You can override the default suffixes with the <code>suffix</code> argument.</p>
 | 
			
		||||
<p><code>join_by(tailnum)</code> is short for <code>join_by(tailnum == tailnum)</code>. It’s important to know about this fuller form for two reasons. Firstly, it describes the relationship between the two tables: the keys must be equal. That’s why this type of join is often called an <strong>equi-join</strong>. You’ll learn about non-equi-joins in <a href="#sec-non-equi-joins" data-type="xref">#sec-non-equi-joins</a>.</p>
 | 
			
		||||
@@ -353,30 +346,30 @@ Specifying join keys</h2>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">flights2 |> 
 | 
			
		||||
  left_join(airports, join_by(dest == faa))
 | 
			
		||||
#> # A tibble: 336,776 × 13
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier name    lat   lon   alt
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr> <dbl> <dbl> <dbl>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      Geor…  30.0 -95.3    97
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      Geor…  30.0 -95.3    97
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      Miam…  25.8 -80.3     8
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      <NA>   NA    NA      NA
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      Hart…  33.6 -84.4  1026
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      Chic…  42.0 -87.9   668
 | 
			
		||||
#> # … with 336,770 more rows, and 3 more variables: tz <dbl>, dst <chr>,
 | 
			
		||||
#> #   tzone <chr>
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier name       lat   lon
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr>    <dbl> <dbl>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      George …  30.0 -95.3
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      George …  30.0 -95.3
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      Miami I…  25.8 -80.3
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      <NA>      NA    NA  
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      Hartsfi…  33.6 -84.4
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      Chicago…  42.0 -87.9
 | 
			
		||||
#> # … with 336,770 more rows, and 4 more variables: alt <dbl>, tz <dbl>,
 | 
			
		||||
#> #   dst <chr>, tzone <chr>
 | 
			
		||||
 | 
			
		||||
flights2 |> 
 | 
			
		||||
  left_join(airports, join_by(origin == faa))
 | 
			
		||||
#> # A tibble: 336,776 × 13
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier name    lat   lon   alt
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr> <dbl> <dbl> <dbl>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      Newa…  40.7 -74.2    18
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      La G…  40.8 -73.9    22
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      John…  40.6 -73.8    13
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      John…  40.6 -73.8    13
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      La G…  40.8 -73.9    22
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      Newa…  40.7 -74.2    18
 | 
			
		||||
#> # … with 336,770 more rows, and 3 more variables: tz <dbl>, dst <chr>,
 | 
			
		||||
#> #   tzone <chr></pre>
 | 
			
		||||
#>    year time_hour           origin dest  tailnum carrier name       lat   lon
 | 
			
		||||
#>   <int> <dttm>              <chr>  <chr> <chr>   <chr>   <chr>    <dbl> <dbl>
 | 
			
		||||
#> 1  2013 2013-01-01 05:00:00 EWR    IAH   N14228  UA      Newark …  40.7 -74.2
 | 
			
		||||
#> 2  2013 2013-01-01 05:00:00 LGA    IAH   N24211  UA      La Guar…  40.8 -73.9
 | 
			
		||||
#> 3  2013 2013-01-01 05:00:00 JFK    MIA   N619AA  AA      John F …  40.6 -73.8
 | 
			
		||||
#> 4  2013 2013-01-01 05:00:00 JFK    BQN   N804JB  B6      John F …  40.6 -73.8
 | 
			
		||||
#> 5  2013 2013-01-01 06:00:00 LGA    ATL   N668DN  DL      La Guar…  40.8 -73.9
 | 
			
		||||
#> 6  2013 2013-01-01 05:00:00 EWR    ORD   N39463  UA      Newark …  40.7 -74.2
 | 
			
		||||
#> # … with 336,770 more rows, and 4 more variables: alt <dbl>, tz <dbl>,
 | 
			
		||||
#> #   dst <chr>, tzone <chr></pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>In older code you might see a different way of specifying the join keys, using a character vector:</p>
 | 
			
		||||
<ul><li>
 | 
			
		||||
@@ -405,14 +398,14 @@ Filtering joins</h2>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">airports |> 
 | 
			
		||||
  semi_join(flights2, join_by(faa == dest))
 | 
			
		||||
#> # A tibble: 101 × 8
 | 
			
		||||
#>   faa   name                                lat    lon   alt    tz dst   tzone  
 | 
			
		||||
#>   <chr> <chr>                             <dbl>  <dbl> <dbl> <dbl> <chr> <chr>  
 | 
			
		||||
#> 1 ABQ   Albuquerque International Sunport  35.0 -107.   5355    -7 A     Americ…
 | 
			
		||||
#> 2 ACK   Nantucket Mem                      41.3  -70.1    48    -5 A     Americ…
 | 
			
		||||
#> 3 ALB   Albany Intl                        42.7  -73.8   285    -5 A     Americ…
 | 
			
		||||
#> 4 ANC   Ted Stevens Anchorage Intl         61.2 -150.    152    -9 A     Americ…
 | 
			
		||||
#> 5 ATL   Hartsfield Jackson Atlanta Intl    33.6  -84.4  1026    -5 A     Americ…
 | 
			
		||||
#> 6 AUS   Austin Bergstrom Intl              30.2  -97.7   542    -6 A     Americ…
 | 
			
		||||
#>   faa   name                               lat    lon   alt    tz dst   tzone
 | 
			
		||||
#>   <chr> <chr>                            <dbl>  <dbl> <dbl> <dbl> <chr> <chr>
 | 
			
		||||
#> 1 ABQ   Albuquerque International Sunpo…  35.0 -107.   5355    -7 A     Amer…
 | 
			
		||||
#> 2 ACK   Nantucket Mem                     41.3  -70.1    48    -5 A     Amer…
 | 
			
		||||
#> 3 ALB   Albany Intl                       42.7  -73.8   285    -5 A     Amer…
 | 
			
		||||
#> 4 ANC   Ted Stevens Anchorage Intl        61.2 -150.    152    -9 A     Amer…
 | 
			
		||||
#> 5 ATL   Hartsfield Jackson Atlanta Intl   33.6  -84.4  1026    -5 A     Amer…
 | 
			
		||||
#> 6 AUS   Austin Bergstrom Intl             30.2  -97.7   542    -6 A     Amer…
 | 
			
		||||
#> # … with 95 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p><strong>Anti-joins</strong> are the opposite: they return all rows in <code>x</code> that don’t have a match in <code>y</code>. They’re useful for finding missing values that are <strong>implicit</strong> in the data, the topic of <a href="#sec-missing-implicit" data-type="xref">#sec-missing-implicit</a>. Implicitly missing values don’t show up as <code>NA</code>s but instead only exist as an absence. For example, we can find rows that as missing from <code>airports</code> by looking for flights that don’t have a matching destination airport:</p>
 | 
			
		||||
@@ -664,14 +657,14 @@ Allow multiple rows</h2>
 | 
			
		||||
 | 
			
		||||
plane_flights
 | 
			
		||||
#> # A tibble: 284,170 × 9
 | 
			
		||||
#>   tailnum type      engines seats  year time_hour           origin dest  carrier
 | 
			
		||||
#>   <chr>   <chr>       <int> <int> <int> <dttm>              <chr>  <chr> <chr>  
 | 
			
		||||
#> 1 N10156  Fixed wi…       2    55  2013 2013-01-10 06:00:00 EWR    PIT   EV     
 | 
			
		||||
#> 2 N10156  Fixed wi…       2    55  2013 2013-01-10 10:00:00 EWR    CHS   EV     
 | 
			
		||||
#> 3 N10156  Fixed wi…       2    55  2013 2013-01-10 15:00:00 EWR    MSP   EV     
 | 
			
		||||
#> 4 N10156  Fixed wi…       2    55  2013 2013-01-11 06:00:00 EWR    CMH   EV     
 | 
			
		||||
#> 5 N10156  Fixed wi…       2    55  2013 2013-01-11 11:00:00 EWR    MCI   EV     
 | 
			
		||||
#> 6 N10156  Fixed wi…       2    55  2013 2013-01-11 18:00:00 EWR    PWM   EV     
 | 
			
		||||
#>   tailnum type   engines seats  year time_hour           origin dest  carrier
 | 
			
		||||
#>   <chr>   <chr>    <int> <int> <int> <dttm>              <chr>  <chr> <chr>  
 | 
			
		||||
#> 1 N10156  Fixed…       2    55  2013 2013-01-10 06:00:00 EWR    PIT   EV     
 | 
			
		||||
#> 2 N10156  Fixed…       2    55  2013 2013-01-10 10:00:00 EWR    CHS   EV     
 | 
			
		||||
#> 3 N10156  Fixed…       2    55  2013 2013-01-10 15:00:00 EWR    MSP   EV     
 | 
			
		||||
#> 4 N10156  Fixed…       2    55  2013 2013-01-11 06:00:00 EWR    CMH   EV     
 | 
			
		||||
#> 5 N10156  Fixed…       2    55  2013 2013-01-11 11:00:00 EWR    MCI   EV     
 | 
			
		||||
#> 6 N10156  Fixed…       2    55  2013 2013-01-11 18:00:00 EWR    PWM   EV     
 | 
			
		||||
#> # … with 284,164 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</section>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user