More work on O'Reilly book

* Make width narrower
* Convert deps to table
* Strip chapter status
This commit is contained in:
Hadley Wickham
2022-11-18 11:05:00 -06:00
parent 5895db09cd
commit 69b4597f3b
33 changed files with 784 additions and 1048 deletions

View File

@@ -1,13 +1,5 @@
<section data-type="chapter" id="chp-data-tidy">
<h1><span id="sec-data-tidy" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data tidying</span></span></h1><div data-type="note"><div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"/>
</div>
</div>
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter is largely complete and just needs final proof reading. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
<h1><span id="sec-data-tidy" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data tidying</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter is largely complete and just needs final proof reading. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
<section id="introduction" data-type="sect1">
<h1>
Introduction</h1>
@@ -174,21 +166,21 @@ Data in column names</h2>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">billboard
#&gt; # A tibble: 317 × 79
#&gt; artist track date.ent…¹ wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;date&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA NA
#&gt; 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA NA NA
#&gt; 3 3 Door… Kryp… 2000-04-08 81 70 68 67 66 57 54 53 51
#&gt; 4 3 Door… Loser 2000-10-21 76 76 72 69 67 65 55 59 62
#&gt; 5 504 Bo Wobb… 2000-04-15 57 34 25 17 17 31 36 49 53
#&gt; 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2 3
#&gt; # … with 311 more rows, 67 more variables: wk10 &lt;dbl&gt;, wk11 &lt;dbl&gt;, wk12 &lt;dbl&gt;,
#&gt; # wk13 &lt;dbl&gt;, wk14 &lt;dbl&gt;, wk15 &lt;dbl&gt;, wk16 &lt;dbl&gt;, wk17 &lt;dbl&gt;, wk18 &lt;dbl&gt;,
#&gt; # wk19 &lt;dbl&gt;, wk20 &lt;dbl&gt;, wk21 &lt;dbl&gt;, wk22 &lt;dbl&gt;, wk23 &lt;dbl&gt;, wk24 &lt;dbl&gt;,
#&gt; # wk25 &lt;dbl&gt;, wk26 &lt;dbl&gt;, wk27 &lt;dbl&gt;, wk28 &lt;dbl&gt;, wk29 &lt;dbl&gt;, wk30 &lt;dbl&gt;,
#&gt; # wk31 &lt;dbl&gt;, wk32 &lt;dbl&gt;, wk33 &lt;dbl&gt;, wk34 &lt;dbl&gt;, wk35 &lt;dbl&gt;, wk36 &lt;dbl&gt;,
#&gt; # wk37 &lt;dbl&gt;, wk38 &lt;dbl&gt;, wk39 &lt;dbl&gt;, wk40 &lt;dbl&gt;, wk41 &lt;dbl&gt;, wk42 &lt;dbl&gt;,
#&gt; # wk43 &lt;dbl&gt;, wk44 &lt;dbl&gt;, wk45 &lt;dbl&gt;, wk46 &lt;dbl&gt;, wk47 &lt;dbl&gt;, wk48 &lt;dbl&gt;, …</pre>
#&gt; artist track date.ent…¹ wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;date&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA
#&gt; 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA NA
#&gt; 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 53
#&gt; 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 59
#&gt; 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 49
#&gt; 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2
#&gt; # … with 311 more rows, 68 more variables: wk9 &lt;dbl&gt;, wk10 &lt;dbl&gt;,
#&gt; # wk11 &lt;dbl&gt;, wk12 &lt;dbl&gt;, wk13 &lt;dbl&gt;, wk14 &lt;dbl&gt;, wk15 &lt;dbl&gt;, wk16 &lt;dbl&gt;,
#&gt; # wk17 &lt;dbl&gt;, wk18 &lt;dbl&gt;, wk19 &lt;dbl&gt;, wk20 &lt;dbl&gt;, wk21 &lt;dbl&gt;, wk22 &lt;dbl&gt;,
#&gt; # wk23 &lt;dbl&gt;, wk24 &lt;dbl&gt;, wk25 &lt;dbl&gt;, wk26 &lt;dbl&gt;, wk27 &lt;dbl&gt;, wk28 &lt;dbl&gt;,
#&gt; # wk29 &lt;dbl&gt;, wk30 &lt;dbl&gt;, wk31 &lt;dbl&gt;, wk32 &lt;dbl&gt;, wk33 &lt;dbl&gt;, wk34 &lt;dbl&gt;,
#&gt; # wk35 &lt;dbl&gt;, wk36 &lt;dbl&gt;, wk37 &lt;dbl&gt;, wk38 &lt;dbl&gt;, wk39 &lt;dbl&gt;, wk40 &lt;dbl&gt;,
#&gt; # wk41 &lt;dbl&gt;, wk42 &lt;dbl&gt;, wk43 &lt;dbl&gt;, wk44 &lt;dbl&gt;, wk45 &lt;dbl&gt;, …</pre>
</div>
<p>In this dataset, each observation is a song. The first three columns (<code>artist</code>, <code>track</code> and <code>date.entered</code>) are variables that describe the song. Then we have 76 columns (<code>wk1</code>-<code>wk76</code>) that describe the rank of the song in each week. Here, the column names are one variable (the <code>week</code>) and the cell values are another (the <code>rank</code>).</p>
<p>To tidy this data, well use <code><a href="https://tidyr.tidyverse.org/reference/pivot_longer.html">pivot_longer()</a></code>. After the data, there are three key arguments:</p>
@@ -347,21 +339,21 @@ Many variables in column names</h2>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">who2
#&gt; # A tibble: 7,240 × 58
#&gt; country year sp_m_…¹ sp_m_…² sp_m_…³ sp_m_…⁴ sp_m_…⁵ sp_m_…⁶ sp_m_65 sp_f_…⁷
#&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 Afghani 1980 NA NA NA NA NA NA NA NA
#&gt; 2 Afghani 1981 NA NA NA NA NA NA NA NA
#&gt; 3 Afghani 1982 NA NA NA NA NA NA NA NA
#&gt; 4 Afghani 1983 NA NA NA NA NA NA NA NA
#&gt; 5 Afghani 1984 NA NA NA NA NA NA NA NA
#&gt; 6 Afghani 1985 NA NA NA NA NA NA NA NA
#&gt; # … with 7,234 more rows, 48 more variables: sp_f_1524 &lt;dbl&gt;, sp_f_2534 &lt;dbl&gt;,
#&gt; # sp_f_3544 &lt;dbl&gt;, sp_f_4554 &lt;dbl&gt;, sp_f_5564 &lt;dbl&gt;, sp_f_65 &lt;dbl&gt;,
#&gt; # sn_m_014 &lt;dbl&gt;, sn_m_1524 &lt;dbl&gt;, sn_m_2534 &lt;dbl&gt;, sn_m_3544 &lt;dbl&gt;,
#&gt; # sn_m_4554 &lt;dbl&gt;, sn_m_5564 &lt;dbl&gt;, sn_m_65 &lt;dbl&gt;, sn_f_014 &lt;dbl&gt;,
#&gt; # sn_f_1524 &lt;dbl&gt;, sn_f_2534 &lt;dbl&gt;, sn_f_3544 &lt;dbl&gt;, sn_f_4554 &lt;dbl&gt;,
#&gt; # sn_f_5564 &lt;dbl&gt;, sn_f_65 &lt;dbl&gt;, ep_m_014 &lt;dbl&gt;, ep_m_1524 &lt;dbl&gt;,
#&gt; # ep_m_2534 &lt;dbl&gt;, ep_m_3544 &lt;dbl&gt;, ep_m_4554 &lt;dbl&gt;, ep_m_5564 &lt;dbl&gt;, …</pre>
#&gt; country year sp_m_014 sp_m_1…¹ sp_m_…² sp_m_…³ sp_m_…⁴ sp_m_…⁵ sp_m_65
#&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 Afghanistan 1980 NA NA NA NA NA NA NA
#&gt; 2 Afghanistan 1981 NA NA NA NA NA NA NA
#&gt; 3 Afghanistan 1982 NA NA NA NA NA NA NA
#&gt; 4 Afghanistan 1983 NA NA NA NA NA NA NA
#&gt; 5 Afghanistan 1984 NA NA NA NA NA NA NA
#&gt; 6 Afghanistan 1985 NA NA NA NA NA NA NA
#&gt; # … with 7,234 more rows, 49 more variables: sp_f_014 &lt;dbl&gt;,
#&gt; # sp_f_1524 &lt;dbl&gt;, sp_f_2534 &lt;dbl&gt;, sp_f_3544 &lt;dbl&gt;, sp_f_4554 &lt;dbl&gt;,
#&gt; # sp_f_5564 &lt;dbl&gt;, sp_f_65 &lt;dbl&gt;, sn_m_014 &lt;dbl&gt;, sn_m_1524 &lt;dbl&gt;,
#&gt; # sn_m_2534 &lt;dbl&gt;, sn_m_3544 &lt;dbl&gt;, sn_m_4554 &lt;dbl&gt;, sn_m_5564 &lt;dbl&gt;,
#&gt; # sn_m_65 &lt;dbl&gt;, sn_f_014 &lt;dbl&gt;, sn_f_1524 &lt;dbl&gt;, sn_f_2534 &lt;dbl&gt;,
#&gt; # sn_f_3544 &lt;dbl&gt;, sn_f_4554 &lt;dbl&gt;, sn_f_5564 &lt;dbl&gt;, sn_f_65 &lt;dbl&gt;,
#&gt; # ep_m_014 &lt;dbl&gt;, ep_m_1524 &lt;dbl&gt;, ep_m_2534 &lt;dbl&gt;, ep_m_3544 &lt;dbl&gt;, …</pre>
</div>
<p>This dataset records information about tuberculosis data collected by the WHO. There are two columns that are already variables and are easy to interpret: <code>country</code> and <code>year</code>. They are followed by 56 columns like <code>sp_m_014</code>, <code>ep_m_4554</code>, and <code>rel_m_3544</code>. If you stare at these columns for long enough, youll notice theres a pattern. Each column name is made up of three pieces separated by <code>_</code>. The first piece, <code>sp</code>/<code>rel</code>/<code>ep</code>, describes the method used for the <code>diagnosis</code>, the second piece, <code>m</code>/<code>f</code> is the <code>gender</code>, and the third piece, <code>014</code>/<code>1524</code>/<code>2535</code>/<code>3544</code>/<code>4554</code>/<code>65</code> is the <code>age</code> range.</p>
<p>So in this case we have six variables: two variables are already columns, three variables are contained in the column name, and one variable is in the cell name. This requires two changes to our call to <code><a href="https://tidyr.tidyverse.org/reference/pivot_longer.html">pivot_longer()</a></code>: <code>names_to</code> gets a vector of column names and <code>names_sep</code> describes how to split the variable name up into pieces:</p>
@@ -454,14 +446,14 @@ Widening data</h2>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">cms_patient_experience
#&gt; # A tibble: 500 × 5
#&gt; org_pac_id org_nm measure_cd measure_title prf_r…¹
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_1 CAHPS for MIPS SSM… 63
#&gt; 2 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_2 CAHPS for MIPS SSM… 87
#&gt; 3 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_3 CAHPS for MIPS SSM… 86
#&gt; 4 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_5 CAHPS for MIPS SSM… 57
#&gt; 5 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_8 CAHPS for MIPS SSM… 85
#&gt; 6 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_12 CAHPS for MIPS SSM… 24
#&gt; org_pac_id org_nm measure_cd measure_title prf_r…¹
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_1 CAHPS for MIPS … 63
#&gt; 2 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_2 CAHPS for MIPS … 87
#&gt; 3 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_3 CAHPS for MIPS … 86
#&gt; 4 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_5 CAHPS for MIPS … 57
#&gt; 5 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_8 CAHPS for MIPS … 85
#&gt; 6 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_12 CAHPS for MIPS … 24
#&gt; # … with 494 more rows, and abbreviated variable name ¹prf_rate</pre>
</div>
<p>An observation is an organisation, but each organisation is spread across six rows, with one row for each variable, or measure. We can see the complete set of values for <code>measure_cd</code> and <code>measure_title</code> by using <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>:</p>
@@ -469,13 +461,13 @@ Widening data</h2>
<pre data-type="programlisting" data-code-language="downlit">cms_patient_experience |&gt;
distinct(measure_cd, measure_title)
#&gt; # A tibble: 6 × 2
#&gt; measure_cd measure_title
#&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 CAHPS_GRP_1 CAHPS for MIPS SSM: Getting Timely Care, Appointments, and Infor
#&gt; 2 CAHPS_GRP_2 CAHPS for MIPS SSM: How Well Providers Communicate
#&gt; 3 CAHPS_GRP_3 CAHPS for MIPS SSM: Patient's Rating of Provider
#&gt; 4 CAHPS_GRP_5 CAHPS for MIPS SSM: Health Promotion and Education
#&gt; 5 CAHPS_GRP_8 CAHPS for MIPS SSM: Courteous and Helpful Office Staff
#&gt; measure_cd measure_title
#&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 CAHPS_GRP_1 CAHPS for MIPS SSM: Getting Timely Care, Appointments, and In…
#&gt; 2 CAHPS_GRP_2 CAHPS for MIPS SSM: How Well Providers Communicate
#&gt; 3 CAHPS_GRP_3 CAHPS for MIPS SSM: Patient's Rating of Provider
#&gt; 4 CAHPS_GRP_5 CAHPS for MIPS SSM: Health Promotion and Education
#&gt; 5 CAHPS_GRP_8 CAHPS for MIPS SSM: Courteous and Helpful Office Staff
#&gt; 6 CAHPS_GRP_12 CAHPS for MIPS SSM: Stewardship of Patient Resources</pre>
</div>
<p>Neither of these columns will make particularly great variable names: <code>measure_cd</code> doesnt hint at the meaning of the variable and <code>measure_title</code> is a long sentence containing spaces. Well use <code>measure_cd</code> for now, but in a real analysis you might want to create your own variable names that are both short and meaningful.</p>
@@ -487,14 +479,14 @@ Widening data</h2>
values_from = prf_rate
)
#&gt; # A tibble: 500 × 9
#&gt; org_pac_id org_nm measu…¹ CAHPS…² CAHPS…³ CAHPS…⁴ CAHPS…⁵ CAHPS…⁶ CAHPS…⁷
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 0446157747 USC CARE M… CAHPS … 63 NA NA NA NA NA
#&gt; 2 0446157747 USC CARE M… CAHPS … NA 87 NA NA NA NA
#&gt; 3 0446157747 USC CARE M… CAHPS … NA NA 86 NA NA NA
#&gt; 4 0446157747 USC CARE M… CAHPS … NA NA NA 57 NA NA
#&gt; 5 0446157747 USC CARE M… CAHPS … NA NA NA NA 85 NA
#&gt; 6 0446157747 USC CARE M… CAHPS … NA NA NA NA NA 24
#&gt; org_pac_id org_nm measu…¹ CAHPS…² CAHPS…³ CAHPS…⁴ CAHPS…⁵ CAHPS…⁶ CAHPS…⁷
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 0446157747 USC CAR… CAHPS … 63 NA NA NA NA NA
#&gt; 2 0446157747 USC CAR… CAHPS … NA 87 NA NA NA NA
#&gt; 3 0446157747 USC CAR… CAHPS … NA NA 86 NA NA NA
#&gt; 4 0446157747 USC CAR… CAHPS … NA NA NA 57 NA NA
#&gt; 5 0446157747 USC CAR… CAHPS … NA NA NA NA 85 NA
#&gt; 6 0446157747 USC CAR… CAHPS … NA NA NA NA NA 24
#&gt; # … with 494 more rows, and abbreviated variable names ¹measure_title,
#&gt; # ²CAHPS_GRP_1, ³CAHPS_GRP_2, ⁴CAHPS_GRP_3, ⁵CAHPS_GRP_5, ⁶CAHPS_GRP_8,
#&gt; # ⁷CAHPS_GRP_12</pre>
@@ -508,14 +500,14 @@ Widening data</h2>
values_from = prf_rate
)
#&gt; # A tibble: 95 × 8
#&gt; org_pac_id org_nm CAHPS…¹ CAHPS…² CAHPS…³ CAHPS…⁴ CAHPS…⁵ CAHPS…⁶
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 0446157747 USC CARE MEDICAL G… 63 87 86 57 85 24
#&gt; 2 0446162697 ASSOCIATION OF UNI… 59 85 83 63 88 22
#&gt; 3 0547164295 BEAVER MEDICAL GRO… 49 NA 75 44 73 12
#&gt; 4 0749333730 CAPE PHYSICIANS AS… 67 84 85 65 82 24
#&gt; 5 0840104360 ALLIANCE PHYSICIAN… 66 87 87 64 87 28
#&gt; 6 0840109864 REX HOSPITAL INC 73 87 84 67 91 30
#&gt; org_pac_id org_nm CAHPS…¹ CAHPS…² CAHPS…³ CAHPS…⁴ CAHPS…⁵ CAHPS…⁶
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 0446157747 USC CARE MEDICA… 63 87 86 57 85 24
#&gt; 2 0446162697 ASSOCIATION OF … 59 85 83 63 88 22
#&gt; 3 0547164295 BEAVER MEDICAL … 49 NA 75 44 73 12
#&gt; 4 0749333730 CAPE PHYSICIANS… 67 84 85 65 82 24
#&gt; 5 0840104360 ALLIANCE PHYSIC… 66 87 87 64 87 28
#&gt; 6 0840109864 REX HOSPITAL INC 73 87 84 67 91 30
#&gt; # … with 89 more rows, and abbreviated variable names ¹CAHPS_GRP_1,
#&gt; # ²CAHPS_GRP_2, ³CAHPS_GRP_3, ⁴CAHPS_GRP_5, ⁵CAHPS_GRP_8, ⁶CAHPS_GRP_12</pre>
</div>
@@ -602,7 +594,8 @@ How does<code>pivot_wider()</code> work?</h2>
names_from = name,
values_from = value
)
#&gt; Warning: Values from `value` are not uniquely identified; output will contain list-cols.
#&gt; Warning: Values from `value` are not uniquely identified; output will contain
#&gt; list-cols.
#&gt; • Use `values_fn = list` to suppress this warning.
#&gt; • Use `values_fn = {summary_fun}` to summarise duplicates.
#&gt; • Use the following dplyr code to identify duplicates.
@@ -695,15 +688,16 @@ col_year &lt;- gapminder |&gt;
)
col_year
#&gt; # A tibble: 142 × 13
#&gt; country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` `1997`
#&gt; &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 Afghani 2.89 2.91 2.93 2.92 2.87 2.90 2.99 2.93 2.81 2.80
#&gt; 2 Albania 3.20 3.29 3.36 3.44 3.52 3.55 3.56 3.57 3.40 3.50
#&gt; 3 Algeria 3.39 3.48 3.41 3.51 3.62 3.69 3.76 3.75 3.70 3.68
#&gt; 4 Angola 3.55 3.58 3.63 3.74 3.74 3.48 3.44 3.39 3.42 3.36
#&gt; 5 Argenti 3.77 3.84 3.85 3.91 3.98 4.00 3.95 3.96 3.97 4.04
#&gt; 6 Austral 4.00 4.04 4.09 4.16 4.23 4.26 4.29 4.34 4.37 4.43
#&gt; # … with 136 more rows, and 2 more variables: `2002` &lt;dbl&gt;, `2007` &lt;dbl&gt;</pre>
#&gt; country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
#&gt; &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 Afghanistan 2.89 2.91 2.93 2.92 2.87 2.90 2.99 2.93 2.81
#&gt; 2 Albania 3.20 3.29 3.36 3.44 3.52 3.55 3.56 3.57 3.40
#&gt; 3 Algeria 3.39 3.48 3.41 3.51 3.62 3.69 3.76 3.75 3.70
#&gt; 4 Angola 3.55 3.58 3.63 3.74 3.74 3.48 3.44 3.39 3.42
#&gt; 5 Argentina 3.77 3.84 3.85 3.91 3.98 4.00 3.95 3.96 3.97
#&gt; 6 Australia 4.00 4.04 4.09 4.16 4.23 4.26 4.29 4.34 4.37
#&gt; # … with 136 more rows, and 3 more variables: `1997` &lt;dbl&gt;, `2002` &lt;dbl&gt;,
#&gt; # `2007` &lt;dbl&gt;</pre>
</div>
<p><code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code> produces a tibble where each row is labelled by the <code>country</code> variable. But most classic statistical algorithms dont want the identifier as an explicit variable; they want as a <strong>row name</strong>. We can turn the <code>country</code> variable into row names with <code>column_to_rowname()</code>:</p>
<div class="cell">