More work on O'Reilly book
* Make width narrower * Convert deps to table * Strip chapter status
This commit is contained in:
@@ -1,13 +1,5 @@
|
||||
<section data-type="chapter" id="chp-data-transform">
|
||||
<h1><span id="sec-data-transform" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data transformation</span></span></h1><div data-type="note"><div class="callout-body d-flex">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"/>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
|
||||
|
||||
<h1><span id="sec-data-transform" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data transformation</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
|
||||
<section id="introduction" data-type="sect1">
|
||||
<h1>
|
||||
Introduction</h1>
|
||||
@@ -21,12 +13,12 @@ Prerequisites</h2>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">library(nycflights13)
|
||||
library(tidyverse)
|
||||
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
|
||||
#> ── Attaching packages ──────────────────────────────────── tidyverse 1.3.2 ──
|
||||
#> ✔ ggplot2 3.4.0.9000 ✔ purrr 0.9000.0.9000
|
||||
#> ✔ tibble 3.1.8 ✔ dplyr 1.0.99.9000
|
||||
#> ✔ tidyr 1.2.1.9001 ✔ stringr 1.4.1.9000
|
||||
#> ✔ readr 2.1.3 ✔ forcats 0.5.2
|
||||
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
|
||||
#> ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
|
||||
#> ✖ dplyr::filter() masks stats::filter()
|
||||
#> ✖ dplyr::lag() masks stats::lag()</pre>
|
||||
</div>
|
||||
@@ -40,14 +32,14 @@ nycflights13</h2>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -92,14 +84,14 @@ Rows</h1>
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
filter(arr_delay > 120)
|
||||
#> # A tibble: 10,034 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 811 630 101 1047 830 137 MQ
|
||||
#> 2 2013 1 1 848 1835 853 1001 1950 851 MQ
|
||||
#> 3 2013 1 1 957 733 144 1056 853 123 UA
|
||||
#> 4 2013 1 1 1114 900 134 1447 1222 145 UA
|
||||
#> 5 2013 1 1 1505 1310 115 1638 1431 127 EV
|
||||
#> 6 2013 1 1 1525 1340 105 1831 1626 125 B6
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 811 630 101 1047 830 137 MQ
|
||||
#> 2 2013 1 1 848 1835 853 1001 1950 851 MQ
|
||||
#> 3 2013 1 1 957 733 144 1056 853 123 UA
|
||||
#> 4 2013 1 1 1114 900 134 1447 1222 145 UA
|
||||
#> 5 2013 1 1 1505 1310 115 1638 1431 127 EV
|
||||
#> 6 2013 1 1 1525 1340 105 1831 1626 125 B6
|
||||
#> # … with 10,028 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -111,14 +103,14 @@ Rows</h1>
|
||||
flights |>
|
||||
filter(month == 1 & day == 1)
|
||||
#> # A tibble: 842 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 836 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -128,14 +120,14 @@ flights |>
|
||||
flights |>
|
||||
filter(month == 1 | month == 2)
|
||||
#> # A tibble: 51,955 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 51,949 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -147,14 +139,14 @@ flights |>
|
||||
flights |>
|
||||
filter(month %in% c(1, 2))
|
||||
#> # A tibble: 51,955 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 51,949 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -197,14 +189,14 @@ Common mistakes</h2>
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
arrange(year, month, day, dep_time)
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -215,14 +207,14 @@ Common mistakes</h2>
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
arrange(desc(dep_delay))
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 9 641 900 1301 1242 1530 1272 HA
|
||||
#> 2 2013 6 15 1432 1935 1137 1607 2120 1127 MQ
|
||||
#> 3 2013 1 10 1121 1635 1126 1239 1810 1109 MQ
|
||||
#> 4 2013 9 20 1139 1845 1014 1457 2210 1007 AA
|
||||
#> 5 2013 7 22 845 1600 1005 1044 1815 989 MQ
|
||||
#> 6 2013 4 10 1100 1900 960 1342 2211 931 DL
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 9 641 900 1301 1242 1530 1272 HA
|
||||
#> 2 2013 6 15 1432 1935 1137 1607 2120 1127 MQ
|
||||
#> 3 2013 1 10 1121 1635 1126 1239 1810 1109 MQ
|
||||
#> 4 2013 9 20 1139 1845 1014 1457 2210 1007 AA
|
||||
#> 5 2013 7 22 845 1600 1005 1044 1815 989 MQ
|
||||
#> 6 2013 4 10 1100 1900 960 1342 2211 931 DL
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -234,14 +226,14 @@ Common mistakes</h2>
|
||||
filter(dep_delay <= 10 & dep_delay >= -10) |>
|
||||
arrange(desc(arr_delay))
|
||||
#> # A tibble: 239,109 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 11 1 658 700 -2 1329 1015 194 VX
|
||||
#> 2 2013 4 18 558 600 -2 1149 850 179 AA
|
||||
#> 3 2013 7 7 1659 1700 -1 2050 1823 147 US
|
||||
#> 4 2013 7 22 1606 1615 -9 2056 1831 145 DL
|
||||
#> 5 2013 9 19 648 641 7 1035 810 145 UA
|
||||
#> 6 2013 4 18 655 700 -5 1213 950 143 AA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 11 1 658 700 -2 1329 1015 194 VX
|
||||
#> 2 2013 4 18 558 600 -2 1149 850 179 AA
|
||||
#> 3 2013 7 7 1659 1700 -1 2050 1823 147 US
|
||||
#> 4 2013 7 22 1606 1615 -9 2056 1831 145 DL
|
||||
#> 5 2013 9 19 648 641 7 1035 810 145 UA
|
||||
#> 6 2013 4 18 655 700 -5 1213 950 143 AA
|
||||
#> # … with 239,103 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -285,14 +277,14 @@ Columns</h1>
|
||||
speed = distance / air_time * 60
|
||||
)
|
||||
#> # A tibble: 336,776 × 21
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 336,770 more rows, 11 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, gain <dbl>, speed <dbl>, and abbreviated
|
||||
@@ -308,18 +300,19 @@ Columns</h1>
|
||||
.before = 1
|
||||
)
|
||||
#> # A tibble: 336,776 × 21
|
||||
#> gain speed year month day dep_time sched…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵
|
||||
#> <dbl> <dbl> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl>
|
||||
#> 1 -9 370. 2013 1 1 517 515 2 830 819 11
|
||||
#> 2 -16 374. 2013 1 1 533 529 4 850 830 20
|
||||
#> 3 -31 408. 2013 1 1 542 540 2 923 850 33
|
||||
#> 4 17 517. 2013 1 1 544 545 -1 1004 1022 -18
|
||||
#> 5 19 394. 2013 1 1 554 600 -6 812 837 -25
|
||||
#> 6 -16 288. 2013 1 1 554 558 -4 740 728 12
|
||||
#> # … with 336,770 more rows, 10 more variables: carrier <chr>, flight <int>,
|
||||
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
|
||||
#> # hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
#> # ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay</pre>
|
||||
#> gain speed year month day dep_time sched_dep_…¹ dep_d…² arr_t…³ sched…⁴
|
||||
#> <dbl> <dbl> <int> <int> <int> <int> <int> <dbl> <int> <int>
|
||||
#> 1 -9 370. 2013 1 1 517 515 2 830 819
|
||||
#> 2 -16 374. 2013 1 1 533 529 4 850 830
|
||||
#> 3 -31 408. 2013 1 1 542 540 2 923 850
|
||||
#> 4 17 517. 2013 1 1 544 545 -1 1004 1022
|
||||
#> 5 19 394. 2013 1 1 554 600 -6 812 837
|
||||
#> 6 -16 288. 2013 1 1 554 558 -4 740 728
|
||||
#> # … with 336,770 more rows, 11 more variables: arr_delay <dbl>,
|
||||
#> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
|
||||
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
|
||||
#> # time_hour <dttm>, and abbreviated variable names ¹sched_dep_time,
|
||||
#> # ²dep_delay, ³arr_time, ⁴sched_arr_time</pre>
|
||||
</div>
|
||||
<p>The <code>.</code> is a sign that <code>.before</code> is an argument to the function, not the name of a new variable. You can also use <code>.after</code> to add after a variable, and in both <code>.before</code> and <code>.after</code> you can the name of a variable name instead of a position. For example, we could add the new variables after <code>day:</code></p>
|
||||
<div class="cell">
|
||||
@@ -330,18 +323,19 @@ Columns</h1>
|
||||
.after = day
|
||||
)
|
||||
#> # A tibble: 336,776 × 21
|
||||
#> year month day gain speed dep_time sched…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵
|
||||
#> <int> <int> <int> <dbl> <dbl> <int> <int> <dbl> <int> <int> <dbl>
|
||||
#> 1 2013 1 1 -9 370. 517 515 2 830 819 11
|
||||
#> 2 2013 1 1 -16 374. 533 529 4 850 830 20
|
||||
#> 3 2013 1 1 -31 408. 542 540 2 923 850 33
|
||||
#> 4 2013 1 1 17 517. 544 545 -1 1004 1022 -18
|
||||
#> 5 2013 1 1 19 394. 554 600 -6 812 837 -25
|
||||
#> 6 2013 1 1 -16 288. 554 558 -4 740 728 12
|
||||
#> # … with 336,770 more rows, 10 more variables: carrier <chr>, flight <int>,
|
||||
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
|
||||
#> # hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
#> # ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay</pre>
|
||||
#> year month day gain speed dep_time sched_dep_…¹ dep_d…² arr_t…³ sched…⁴
|
||||
#> <int> <int> <int> <dbl> <dbl> <int> <int> <dbl> <int> <int>
|
||||
#> 1 2013 1 1 -9 370. 517 515 2 830 819
|
||||
#> 2 2013 1 1 -16 374. 533 529 4 850 830
|
||||
#> 3 2013 1 1 -31 408. 542 540 2 923 850
|
||||
#> 4 2013 1 1 17 517. 544 545 -1 1004 1022
|
||||
#> 5 2013 1 1 19 394. 554 600 -6 812 837
|
||||
#> 6 2013 1 1 -16 288. 554 558 -4 740 728
|
||||
#> # … with 336,770 more rows, 11 more variables: arr_delay <dbl>,
|
||||
#> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
|
||||
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
|
||||
#> # time_hour <dttm>, and abbreviated variable names ¹sched_dep_time,
|
||||
#> # ²dep_delay, ³arr_time, ⁴sched_arr_time</pre>
|
||||
</div>
|
||||
<p>Alternatively, you can control which variables are kept with the <code>.keep</code> argument. A particularly useful argument is <code>"used"</code> which allows you to see the inputs and outputs from your calculations:</p>
|
||||
<div class="cell">
|
||||
@@ -403,18 +397,18 @@ flights |>
|
||||
flights |>
|
||||
select(!year:day)
|
||||
#> # A tibble: 336,776 × 16
|
||||
#> dep_time sched…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier flight tailnum origin
|
||||
#> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr>
|
||||
#> 1 517 515 2 830 819 11 UA 1545 N14228 EWR
|
||||
#> 2 533 529 4 850 830 20 UA 1714 N24211 LGA
|
||||
#> 3 542 540 2 923 850 33 AA 1141 N619AA JFK
|
||||
#> 4 544 545 -1 1004 1022 -18 B6 725 N804JB JFK
|
||||
#> 5 554 600 -6 812 837 -25 DL 461 N668DN LGA
|
||||
#> 6 554 558 -4 740 728 12 UA 1696 N39463 EWR
|
||||
#> # … with 336,770 more rows, 6 more variables: dest <chr>, air_time <dbl>,
|
||||
#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>, and abbreviated
|
||||
#> # variable names ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time,
|
||||
#> # ⁵arr_delay
|
||||
#> dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier flight tailnum
|
||||
#> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr>
|
||||
#> 1 517 515 2 830 819 11 UA 1545 N14228
|
||||
#> 2 533 529 4 850 830 20 UA 1714 N24211
|
||||
#> 3 542 540 2 923 850 33 AA 1141 N619AA
|
||||
#> 4 544 545 -1 1004 1022 -18 B6 725 N804JB
|
||||
#> 5 554 600 -6 812 837 -25 DL 461 N668DN
|
||||
#> 6 554 558 -4 740 728 12 UA 1696 N39463
|
||||
#> # … with 336,770 more rows, 7 more variables: origin <chr>, dest <chr>,
|
||||
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
|
||||
#> # time_hour <dttm>, and abbreviated variable names ¹sched_dep_time,
|
||||
#> # ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
|
||||
|
||||
# Select all columns that are characters
|
||||
flights |>
|
||||
@@ -466,14 +460,14 @@ flights |>
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
rename(tail_num = tailnum)
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tail_num <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -492,51 +486,51 @@ flights |>
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
relocate(time_hour, air_time)
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> time_hour air_time year month day dep_t…¹ sched…² dep_d…³ arr_t…⁴
|
||||
#> <dttm> <dbl> <int> <int> <int> <int> <int> <dbl> <int>
|
||||
#> 1 2013-01-01 05:00:00 227 2013 1 1 517 515 2 830
|
||||
#> 2 2013-01-01 05:00:00 227 2013 1 1 533 529 4 850
|
||||
#> 3 2013-01-01 05:00:00 160 2013 1 1 542 540 2 923
|
||||
#> 4 2013-01-01 05:00:00 183 2013 1 1 544 545 -1 1004
|
||||
#> 5 2013-01-01 06:00:00 116 2013 1 1 554 600 -6 812
|
||||
#> 6 2013-01-01 05:00:00 150 2013 1 1 554 558 -4 740
|
||||
#> # … with 336,770 more rows, 10 more variables: sched_arr_time <int>,
|
||||
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, origin <chr>,
|
||||
#> # dest <chr>, distance <dbl>, hour <dbl>, minute <dbl>, and abbreviated
|
||||
#> # variable names ¹dep_time, ²sched_dep_time, ³dep_delay, ⁴arr_time</pre>
|
||||
#> time_hour air_time year month day dep_time sched_dep…¹ dep_d…²
|
||||
#> <dttm> <dbl> <int> <int> <int> <int> <int> <dbl>
|
||||
#> 1 2013-01-01 05:00:00 227 2013 1 1 517 515 2
|
||||
#> 2 2013-01-01 05:00:00 227 2013 1 1 533 529 4
|
||||
#> 3 2013-01-01 05:00:00 160 2013 1 1 542 540 2
|
||||
#> 4 2013-01-01 05:00:00 183 2013 1 1 544 545 -1
|
||||
#> 5 2013-01-01 06:00:00 116 2013 1 1 554 600 -6
|
||||
#> 6 2013-01-01 05:00:00 150 2013 1 1 554 558 -4
|
||||
#> # … with 336,770 more rows, 11 more variables: arr_time <int>,
|
||||
#> # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
|
||||
#> # tailnum <chr>, origin <chr>, dest <chr>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, and abbreviated variable names ¹sched_dep_time, ²dep_delay</pre>
|
||||
</div>
|
||||
<p>But you can use the same <code>.before</code> and <code>.after</code> arguments as <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> to choose where to put them:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
relocate(year:dep_time, .after = time_hour)
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier flight tailnum origin dest
|
||||
#> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
|
||||
#> 1 515 2 830 819 11 UA 1545 N14228 EWR IAH
|
||||
#> 2 529 4 850 830 20 UA 1714 N24211 LGA IAH
|
||||
#> 3 540 2 923 850 33 AA 1141 N619AA JFK MIA
|
||||
#> 4 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN
|
||||
#> 5 600 -6 812 837 -25 DL 461 N668DN LGA ATL
|
||||
#> 6 558 -4 740 728 12 UA 1696 N39463 EWR ORD
|
||||
#> # … with 336,770 more rows, 9 more variables: air_time <dbl>, distance <dbl>,
|
||||
#> # hour <dbl>, minute <dbl>, time_hour <dttm>, year <int>, month <int>,
|
||||
#> # day <int>, dep_time <int>, and abbreviated variable names ¹sched_dep_time,
|
||||
#> # ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
|
||||
#> sched…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier flight tailnum origin dest
|
||||
#> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr>
|
||||
#> 1 515 2 830 819 11 UA 1545 N14228 EWR IAH
|
||||
#> 2 529 4 850 830 20 UA 1714 N24211 LGA IAH
|
||||
#> 3 540 2 923 850 33 AA 1141 N619AA JFK MIA
|
||||
#> 4 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN
|
||||
#> 5 600 -6 812 837 -25 DL 461 N668DN LGA ATL
|
||||
#> 6 558 -4 740 728 12 UA 1696 N39463 EWR ORD
|
||||
#> # … with 336,770 more rows, 9 more variables: air_time <dbl>,
|
||||
#> # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>, year <int>,
|
||||
#> # month <int>, day <int>, dep_time <int>, and abbreviated variable names
|
||||
#> # ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay
|
||||
flights |>
|
||||
relocate(starts_with("arr"), .before = dep_time)
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> year month day arr_time arr_delay dep_time sched_…¹ dep_d…² sched…³ carrier
|
||||
#> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <int> <chr>
|
||||
#> 1 2013 1 1 830 11 517 515 2 819 UA
|
||||
#> 2 2013 1 1 850 20 533 529 4 830 UA
|
||||
#> 3 2013 1 1 923 33 542 540 2 850 AA
|
||||
#> 4 2013 1 1 1004 -18 544 545 -1 1022 B6
|
||||
#> 5 2013 1 1 812 -25 554 600 -6 837 DL
|
||||
#> 6 2013 1 1 740 12 554 558 -4 728 UA
|
||||
#> year month day arr_time arr_de…¹ dep_t…² sched…³ dep_d…⁴ sched…⁵ carrier
|
||||
#> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <int> <chr>
|
||||
#> 1 2013 1 1 830 11 517 515 2 819 UA
|
||||
#> 2 2013 1 1 850 20 533 529 4 830 UA
|
||||
#> 3 2013 1 1 923 33 542 540 2 850 AA
|
||||
#> 4 2013 1 1 1004 -18 544 545 -1 1022 B6
|
||||
#> 5 2013 1 1 812 -25 554 600 -6 837 DL
|
||||
#> 6 2013 1 1 740 12 554 558 -4 728 UA
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
#> # ¹sched_dep_time, ²dep_delay, ³sched_arr_time</pre>
|
||||
#> # ¹arr_delay, ²dep_time, ³sched_dep_time, ⁴dep_delay, ⁵sched_arr_time</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
@@ -580,14 +574,14 @@ Groups</h1>
|
||||
group_by(month)
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> # Groups: month [12]
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -679,14 +673,14 @@ The<code>slice_</code> functions</h2>
|
||||
slice_max(arr_delay, n = 1)
|
||||
#> # A tibble: 108 × 19
|
||||
#> # Groups: dest [105]
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 7 22 2145 2007 98 132 2259 153 B6
|
||||
#> 2 2013 7 23 1139 800 219 1250 909 221 B6
|
||||
#> 3 2013 1 25 123 2000 323 229 2101 328 EV
|
||||
#> 4 2013 8 17 1740 1625 75 2042 2003 39 UA
|
||||
#> 5 2013 7 22 2257 759 898 121 1026 895 DL
|
||||
#> 6 2013 7 10 2056 1505 351 2347 1758 349 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 7 22 2145 2007 98 132 2259 153 B6
|
||||
#> 2 2013 7 23 1139 800 219 1250 909 221 B6
|
||||
#> 3 2013 1 25 123 2000 323 229 2101 328 EV
|
||||
#> 4 2013 8 17 1740 1625 75 2042 2003 39 UA
|
||||
#> 5 2013 7 22 2257 759 898 121 1026 895 DL
|
||||
#> 6 2013 7 10 2056 1505 351 2347 1758 349 UA
|
||||
#> # … with 102 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -725,14 +719,14 @@ Grouping by multiple variables</h2>
|
||||
daily
|
||||
#> # A tibble: 336,776 × 19
|
||||
#> # Groups: year, month, day [365]
|
||||
#> year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> year month day dep_time sched_…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
|
||||
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr>
|
||||
#> 1 2013 1 1 517 515 2 830 819 11 UA
|
||||
#> 2 2013 1 1 533 529 4 850 830 20 UA
|
||||
#> 3 2013 1 1 542 540 2 923 850 33 AA
|
||||
#> 4 2013 1 1 544 545 -1 1004 1022 -18 B6
|
||||
#> 5 2013 1 1 554 600 -6 812 837 -25 DL
|
||||
#> 6 2013 1 1 554 558 -4 740 728 12 UA
|
||||
#> # … with 336,770 more rows, 9 more variables: flight <int>, tailnum <chr>,
|
||||
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
||||
#> # minute <dbl>, time_hour <dttm>, and abbreviated variable names
|
||||
@@ -744,8 +738,8 @@ daily
|
||||
summarize(
|
||||
n = n()
|
||||
)
|
||||
#> `summarise()` has grouped output by 'year', 'month'. You can override using the
|
||||
#> `.groups` argument.</pre>
|
||||
#> `summarise()` has grouped output by 'year', 'month'. You can override using
|
||||
#> the `.groups` argument.</pre>
|
||||
</div>
|
||||
<p>If you’re happy with this behavior, you can explicitly request it in order to suppress the message:</p>
|
||||
<div class="cell">
|
||||
|
||||
Reference in New Issue
Block a user