Re-render book for O'Reilly

2023-01-12 17:22:57 -06:00
parent 28671ed8bd
commit 360d65ae47
113 changed files with 4957 additions and 2997 deletions
--- a/oreilly/data-tidy.html
+++ b/oreilly/data-tidy.html
@@ -12,12 +12,12 @@ Introduction</h1>
 — Hadley Wickham</p>
 </blockquote>
 <p>In this chapter, you will learn a consistent way to organize your data in R using a system called <strong>tidy data</strong>. Getting your data into this format requires some work up front, but that work pays off in the long term. Once you have tidy data and the tidy tools provided by packages in the tidyverse, you will spend much less time munging data from one representation to another, allowing you to spend more time on the data questions you care about.</p>
-<p>In this chapter, you’ll first learn the definition of tidy data and see it applied to simple toy dataset. Then we’ll dive into the main tool you’ll use for tidying data: pivoting. Pivoting allows you to change the form of your data, without changing any of the values. We’ll finish up with a discussion of usefully untidy data, and how you can create it if needed.</p>
+<p>In this chapter, you’ll first learn the definition of tidy data and see it applied to a simple toy dataset. Then we’ll dive into the primary tool you’ll use for tidying data: pivoting. Pivoting allows you to change the form of your data without changing any of the values. We’ll finish with a discussion of usefully untidy data and how you can create it if needed.</p>

 <section id="prerequisites" data-type="sect2">
 <h2>
 Prerequisites</h2>
-<p>In this chapter we’ll focus on tidyr, a package that provides a bunch of tools to help tidy up your messy datasets. tidyr is a member of the core tidyverse.</p>
+<p>In this chapter, we’ll focus on tidyr, a package that provides a bunch of tools to help tidy up your messy datasets. tidyr is a member of the core tidyverse.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">library(tidyverse)</pre>
 </div>
@@ -28,7 +28,7 @@ Prerequisites</h2>
 <section id="sec-tidy-data" data-type="sect1">
 <h1>
 Tidy data</h1>
-<p>You can represent the same underlying data in multiple ways. The example below shows the same data organised in four different ways. Each dataset shows the same values of four variables: <em>country</em>, <em>year</em>, <em>population</em>, and <em>cases</em> of TB (tuberculosis), but each dataset organizes the values in a different way.</p>
+<p>You can represent the same underlying data in multiple ways. The example below shows the same data organized in four different ways. Each dataset shows the same values of four variables: <em>country</em>, <em>year</em>, <em>population</em>, and <em>cases</em> of TB (tuberculosis), but each dataset organizes the values in a different way.</p>

 <!-- TODO redraw as tables -->
 <div class="cell">
@@ -83,7 +83,7 @@ table4b # population
 <p>These are all representations of the same underlying data, but they are not equally easy to use. One of them, <code>table1</code>, will be much easier to work with inside the tidyverse because it’s tidy.</p>
 <p>There are three interrelated rules that make a dataset tidy:</p>
 <ol type="1"><li>Each variable is a column; each column is a variable.</li>
-<li>Each observation is row; each row is an observation.</li>
+<li>Each observation is a row; each row is an observation.</li>
 <li>Each value is a cell; each cell is a single value.</li>
 </ol><p><a href="#fig-tidy-structure" data-type="xref">#fig-tidy-structure</a> shows the rules visually.</p>
 <div class="cell">
@@ -96,8 +96,8 @@ table4b # population
 </div>
 <p>Why ensure that your data is tidy? There are two main advantages:</p>
 <ol type="1"><li><p>There’s a general advantage to picking one consistent way of storing data. If you have a consistent data structure, it’s easier to learn the tools that work with it because they have an underlying uniformity.</p></li>
-<li><p>There’s a specific advantage to placing variables in columns because it allows R’s vectorised nature to shine. As you learned in <a href="#sec-mutate" data-type="xref">#sec-mutate</a> and <a href="#sec-summarize" data-type="xref">#sec-summarize</a>, most built-in R functions work with vectors of values. That makes transforming tidy data feel particularly natural.</p></li>
-</ol><p>dplyr, ggplot2, and all the other packages in the tidyverse are designed to work with tidy data. Here are a couple of small examples showing how you might work with <code>table1</code>.</p>
+<li><p>There’s a specific advantage to placing variables in columns because it allows R’s vectorized nature to shine. As you learned in <a href="#sec-mutate" data-type="xref">#sec-mutate</a> and <a href="#sec-summarize" data-type="xref">#sec-summarize</a>, most built-in R functions work with vectors of values. That makes transforming tidy data feel particularly natural.</p></li>
+</ol><p>dplyr, ggplot2, and all the other packages in the tidyverse are designed to work with tidy data. Here are a few small examples showing how you might work with <code>table1</code>.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># Compute rate per 10,000
 table1 |&gt;
@@ -124,12 +124,12 @@ table1 |&gt;
 #&gt; 2  2000 296920

 # Visualise changes over time
-ggplot(table1, aes(year, cases)) +
+ggplot(table1, aes(x = year, y = cases)) +
  geom_line(aes(group = country), color = "grey50") +
  geom_point(aes(color = country, shape = country)) +
  scale_x_continuous(breaks = c(1999, 2000))</pre>
 <div class="cell-output-display">
-<p><img src="data-tidy_files/figure-html/unnamed-chunk-5-1.png" alt="This figure shows the numbers of cases in 1999 and 2000 for Afghanistan, Brazil, and China, with year on the x-axis and number of cases on the y-axis. Each point on the plot represents the number of cases in a given country in a given year. The points for each country are differentiated from others by color and shape and connected with a line, resulting in three, non-parallel, non-intersecting lines. The numbers of cases in China are highest for both 1999 and 2000, with values above 200,000 for both years. The number of cases in Brazil is approximately 40,000 in 1999 and approximately 75,000 in 2000. The numbers of cases in Afghanistan are lowest for both 1999 and 2000, with values that appear to be very close to 0 on this scale." width="480"/></p>
+<p><img src="data-tidy_files/figure-html/unnamed-chunk-5-1.png" alt="This figure shows the number of cases in 1999 and 2000 for Afghanistan, Brazil, and China, with year on the x-axis and number of cases on the y-axis. Each point on the plot represents the number of cases in a given country in a given year. The points for each country are differentiated from others by color and shape and connected with a line, resulting in three, non-parallel, non-intersecting lines. The numbers of cases in China are highest for both 1999 and 2000, with values above 200,000 for both years. The number of cases in Brazil is approximately 40,000 in 1999 and approximately 75,000 in 2000. The numbers of cases in Afghanistan are lowest for both 1999 and 2000, with values that appear to be very close to 0 on this scale." width="480"/></p>
 </div>
 </div>

@@ -166,15 +166,15 @@ Data in column names</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">billboard
 #&gt; # A tibble: 317 × 79
-#&gt;   artist     track date.ent…¹   wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8
-#&gt;   &lt;chr&gt;      &lt;chr&gt; &lt;date&gt;     &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
-#&gt; 1 2 Pac      Baby… 2000-02-26    87    82    72    77    87    94    99    NA
-#&gt; 2 2Ge+her    The … 2000-09-02    91    87    92    NA    NA    NA    NA    NA
-#&gt; 3 3 Doors D… Kryp… 2000-04-08    81    70    68    67    66    57    54    53
-#&gt; 4 3 Doors D… Loser 2000-10-21    76    76    72    69    67    65    55    59
-#&gt; 5 504 Boyz   Wobb… 2000-04-15    57    34    25    17    17    31    36    49
-#&gt; 6 98^0       Give… 2000-08-19    51    39    34    26    26    19     2     2
-#&gt; # … with 311 more rows, 68 more variables: wk9 &lt;dbl&gt;, wk10 &lt;dbl&gt;,
+#&gt;   artist   track date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8
+#&gt;   &lt;chr&gt;    &lt;chr&gt; &lt;date&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
+#&gt; 1 2 Pac    Baby… 2000-02-26      87    82    72    77    87    94    99    NA
+#&gt; 2 2Ge+her  The … 2000-09-02      91    87    92    NA    NA    NA    NA    NA
+#&gt; 3 3 Doors… Kryp… 2000-04-08      81    70    68    67    66    57    54    53
+#&gt; 4 3 Doors… Loser 2000-10-21      76    76    72    69    67    65    55    59
+#&gt; 5 504 Boyz Wobb… 2000-04-15      57    34    25    17    17    31    36    49
+#&gt; 6 98^0     Give… 2000-08-19      51    39    34    26    26    19     2     2
+#&gt; # … with 311 more rows, and 68 more variables: wk9 &lt;dbl&gt;, wk10 &lt;dbl&gt;,
 #&gt; #   wk11 &lt;dbl&gt;, wk12 &lt;dbl&gt;, wk13 &lt;dbl&gt;, wk14 &lt;dbl&gt;, wk15 &lt;dbl&gt;, wk16 &lt;dbl&gt;,
 #&gt; #   wk17 &lt;dbl&gt;, wk18 &lt;dbl&gt;, wk19 &lt;dbl&gt;, wk20 &lt;dbl&gt;, wk21 &lt;dbl&gt;, wk22 &lt;dbl&gt;,
 #&gt; #   wk23 &lt;dbl&gt;, wk24 &lt;dbl&gt;, wk25 &lt;dbl&gt;, wk26 &lt;dbl&gt;, wk27 &lt;dbl&gt;, wk28 &lt;dbl&gt;,
@@ -261,7 +261,7 @@ billboard_tidy
 <p>Now we’re in a good position to look at how song ranks vary over time by drawing a plot. The code is shown below and the result is <a href="#fig-billboard-ranks" data-type="xref">#fig-billboard-ranks</a>.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">billboard_tidy |&gt; 
-  ggplot(aes(week, rank, group = track)) + 
+  ggplot(aes(x = week, y = rank, group = track)) + 
  geom_line(alpha = 1/3) + 
  scale_y_reverse()</pre>
 <div class="cell-output-display">
@@ -339,21 +339,21 @@ Many variables in column names</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">who2
 #&gt; # A tibble: 7,240 × 58
-#&gt;   country      year sp_m_014 sp_m_1…¹ sp_m_…² sp_m_…³ sp_m_…⁴ sp_m_…⁵ sp_m_65
-#&gt;   &lt;chr&gt;       &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
-#&gt; 1 Afghanistan  1980       NA       NA      NA      NA      NA      NA      NA
-#&gt; 2 Afghanistan  1981       NA       NA      NA      NA      NA      NA      NA
-#&gt; 3 Afghanistan  1982       NA       NA      NA      NA      NA      NA      NA
-#&gt; 4 Afghanistan  1983       NA       NA      NA      NA      NA      NA      NA
-#&gt; 5 Afghanistan  1984       NA       NA      NA      NA      NA      NA      NA
-#&gt; 6 Afghanistan  1985       NA       NA      NA      NA      NA      NA      NA
-#&gt; # … with 7,234 more rows, 49 more variables: sp_f_014 &lt;dbl&gt;,
-#&gt; #   sp_f_1524 &lt;dbl&gt;, sp_f_2534 &lt;dbl&gt;, sp_f_3544 &lt;dbl&gt;, sp_f_4554 &lt;dbl&gt;,
-#&gt; #   sp_f_5564 &lt;dbl&gt;, sp_f_65 &lt;dbl&gt;, sn_m_014 &lt;dbl&gt;, sn_m_1524 &lt;dbl&gt;,
-#&gt; #   sn_m_2534 &lt;dbl&gt;, sn_m_3544 &lt;dbl&gt;, sn_m_4554 &lt;dbl&gt;, sn_m_5564 &lt;dbl&gt;,
-#&gt; #   sn_m_65 &lt;dbl&gt;, sn_f_014 &lt;dbl&gt;, sn_f_1524 &lt;dbl&gt;, sn_f_2534 &lt;dbl&gt;,
-#&gt; #   sn_f_3544 &lt;dbl&gt;, sn_f_4554 &lt;dbl&gt;, sn_f_5564 &lt;dbl&gt;, sn_f_65 &lt;dbl&gt;,
-#&gt; #   ep_m_014 &lt;dbl&gt;, ep_m_1524 &lt;dbl&gt;, ep_m_2534 &lt;dbl&gt;, ep_m_3544 &lt;dbl&gt;, …</pre>
+#&gt;   country     year sp_m_014 sp_m_1524 sp_m_2534 sp_m_3544 sp_m_4554 sp_m_5564
+#&gt;   &lt;chr&gt;      &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;
+#&gt; 1 Afghanist…  1980       NA        NA        NA        NA        NA        NA
+#&gt; 2 Afghanist…  1981       NA        NA        NA        NA        NA        NA
+#&gt; 3 Afghanist…  1982       NA        NA        NA        NA        NA        NA
+#&gt; 4 Afghanist…  1983       NA        NA        NA        NA        NA        NA
+#&gt; 5 Afghanist…  1984       NA        NA        NA        NA        NA        NA
+#&gt; 6 Afghanist…  1985       NA        NA        NA        NA        NA        NA
+#&gt; # … with 7,234 more rows, and 50 more variables: sp_m_65 &lt;dbl&gt;,
+#&gt; #   sp_f_014 &lt;dbl&gt;, sp_f_1524 &lt;dbl&gt;, sp_f_2534 &lt;dbl&gt;, sp_f_3544 &lt;dbl&gt;,
+#&gt; #   sp_f_4554 &lt;dbl&gt;, sp_f_5564 &lt;dbl&gt;, sp_f_65 &lt;dbl&gt;, sn_m_014 &lt;dbl&gt;,
+#&gt; #   sn_m_1524 &lt;dbl&gt;, sn_m_2534 &lt;dbl&gt;, sn_m_3544 &lt;dbl&gt;, sn_m_4554 &lt;dbl&gt;,
+#&gt; #   sn_m_5564 &lt;dbl&gt;, sn_m_65 &lt;dbl&gt;, sn_f_014 &lt;dbl&gt;, sn_f_1524 &lt;dbl&gt;,
+#&gt; #   sn_f_2534 &lt;dbl&gt;, sn_f_3544 &lt;dbl&gt;, sn_f_4554 &lt;dbl&gt;, sn_f_5564 &lt;dbl&gt;,
+#&gt; #   sn_f_65 &lt;dbl&gt;, ep_m_014 &lt;dbl&gt;, ep_m_1524 &lt;dbl&gt;, ep_m_2534 &lt;dbl&gt;, …</pre>
 </div>
 <p>This dataset records information about tuberculosis data collected by the WHO. There are two columns that are already variables and are easy to interpret: <code>country</code> and <code>year</code>. They are followed by 56 columns like <code>sp_m_014</code>, <code>ep_m_4554</code>, and <code>rel_m_3544</code>. If you stare at these columns for long enough, you’ll notice there’s a pattern. Each column name is made up of three pieces separated by <code>_</code>. The first piece, <code>sp</code>/<code>rel</code>/<code>ep</code>, describes the method used for the <code>diagnosis</code>, the second piece, <code>m</code>/<code>f</code> is the <code>gender</code>, and the third piece, <code>014</code>/<code>1524</code>/<code>2535</code>/<code>3544</code>/<code>4554</code>/<code>65</code> is the <code>age</code> range.</p>
 <p>So in this case we have six variables: two variables are already columns, three variables are contained in the column name, and one variable is in the cell name. This requires two changes to our call to <code><a href="https://tidyr.tidyverse.org/reference/pivot_longer.html">pivot_longer()</a></code>: <code>names_to</code> gets a vector of column names and <code>names_sep</code> describes how to split the variable name up into pieces:</p>
@@ -446,15 +446,15 @@ Widening data</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">cms_patient_experience
 #&gt; # A tibble: 500 × 5
-#&gt;   org_pac_id org_nm                     measure_cd   measure_title    prf_r…¹
+#&gt;   org_pac_id org_nm                     measure_cd   measure_title   prf_rate
 #&gt;   &lt;chr&gt;      &lt;chr&gt;                      &lt;chr&gt;        &lt;chr&gt;              &lt;dbl&gt;
-#&gt; 1 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_1  CAHPS for MIPS …      63
-#&gt; 2 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_2  CAHPS for MIPS …      87
-#&gt; 3 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_3  CAHPS for MIPS …      86
-#&gt; 4 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_5  CAHPS for MIPS …      57
-#&gt; 5 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_8  CAHPS for MIPS …      85
-#&gt; 6 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_12 CAHPS for MIPS …      24
-#&gt; # … with 494 more rows, and abbreviated variable name ¹prf_rate</pre>
+#&gt; 1 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_1  CAHPS for MIPS…       63
+#&gt; 2 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_2  CAHPS for MIPS…       87
+#&gt; 3 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_3  CAHPS for MIPS…       86
+#&gt; 4 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_5  CAHPS for MIPS…       57
+#&gt; 5 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_8  CAHPS for MIPS…       85
+#&gt; 6 0446157747 USC CARE MEDICAL GROUP INC CAHPS_GRP_12 CAHPS for MIPS…       24
+#&gt; # … with 494 more rows</pre>
 </div>
 <p>An observation is an organisation, but each organisation is spread across six rows, with one row for each variable, or measure. We can see the complete set of values for <code>measure_cd</code> and <code>measure_title</code> by using <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>:</p>
 <div class="cell">
@@ -479,17 +479,16 @@ Widening data</h2>
    values_from = prf_rate
  )
 #&gt; # A tibble: 500 × 9
-#&gt;   org_pac_id org_nm   measu…¹ CAHPS…² CAHPS…³ CAHPS…⁴ CAHPS…⁵ CAHPS…⁶ CAHPS…⁷
-#&gt;   &lt;chr&gt;      &lt;chr&gt;    &lt;chr&gt;     &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
-#&gt; 1 0446157747 USC CAR… CAHPS …      63      NA      NA      NA      NA      NA
-#&gt; 2 0446157747 USC CAR… CAHPS …      NA      87      NA      NA      NA      NA
-#&gt; 3 0446157747 USC CAR… CAHPS …      NA      NA      86      NA      NA      NA
-#&gt; 4 0446157747 USC CAR… CAHPS …      NA      NA      NA      57      NA      NA
-#&gt; 5 0446157747 USC CAR… CAHPS …      NA      NA      NA      NA      85      NA
-#&gt; 6 0446157747 USC CAR… CAHPS …      NA      NA      NA      NA      NA      24
-#&gt; # … with 494 more rows, and abbreviated variable names ¹measure_title,
-#&gt; #   ²CAHPS_GRP_1, ³CAHPS_GRP_2, ⁴CAHPS_GRP_3, ⁵CAHPS_GRP_5, ⁶CAHPS_GRP_8,
-#&gt; #   ⁷CAHPS_GRP_12</pre>
+#&gt;   org_pac_id org_nm         measure_title CAHPS_GRP_1 CAHPS_GRP_2 CAHPS_GRP_3
+#&gt;   &lt;chr&gt;      &lt;chr&gt;          &lt;chr&gt;               &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;
+#&gt; 1 0446157747 USC CARE MEDI… CAHPS for MI…          63          NA          NA
+#&gt; 2 0446157747 USC CARE MEDI… CAHPS for MI…          NA          87          NA
+#&gt; 3 0446157747 USC CARE MEDI… CAHPS for MI…          NA          NA          86
+#&gt; 4 0446157747 USC CARE MEDI… CAHPS for MI…          NA          NA          NA
+#&gt; 5 0446157747 USC CARE MEDI… CAHPS for MI…          NA          NA          NA
+#&gt; 6 0446157747 USC CARE MEDI… CAHPS for MI…          NA          NA          NA
+#&gt; # … with 494 more rows, and 3 more variables: CAHPS_GRP_5 &lt;dbl&gt;,
+#&gt; #   CAHPS_GRP_8 &lt;dbl&gt;, CAHPS_GRP_12 &lt;dbl&gt;</pre>
 </div>
 <p>The output doesn’t look quite right; we still seem to have multiple rows for each organization. That’s because, by default, <code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code> will attempt to preserve all the existing columns including <code>measure_title</code> which has six distinct observations for each organisations. To fix this problem we need to tell <code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code> which columns identify each row; in this case those are the variables starting with <code>"org"</code>:</p>
 <div class="cell">
@@ -500,16 +499,16 @@ Widening data</h2>
    values_from = prf_rate
  )
 #&gt; # A tibble: 95 × 8
-#&gt;   org_pac_id org_nm           CAHPS…¹ CAHPS…² CAHPS…³ CAHPS…⁴ CAHPS…⁵ CAHPS…⁶
-#&gt;   &lt;chr&gt;      &lt;chr&gt;              &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
-#&gt; 1 0446157747 USC CARE MEDICA…      63      87      86      57      85      24
-#&gt; 2 0446162697 ASSOCIATION OF …      59      85      83      63      88      22
-#&gt; 3 0547164295 BEAVER MEDICAL …      49      NA      75      44      73      12
-#&gt; 4 0749333730 CAPE PHYSICIANS…      67      84      85      65      82      24
-#&gt; 5 0840104360 ALLIANCE PHYSIC…      66      87      87      64      87      28
-#&gt; 6 0840109864 REX HOSPITAL INC      73      87      84      67      91      30
-#&gt; # … with 89 more rows, and abbreviated variable names ¹CAHPS_GRP_1,
-#&gt; #   ²CAHPS_GRP_2, ³CAHPS_GRP_3, ⁴CAHPS_GRP_5, ⁵CAHPS_GRP_8, ⁶CAHPS_GRP_12</pre>
+#&gt;   org_pac_id org_nm           CAHPS_GRP_1 CAHPS_GRP_2 CAHPS_GRP_3 CAHPS_GRP_5
+#&gt;   &lt;chr&gt;      &lt;chr&gt;                  &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;
+#&gt; 1 0446157747 USC CARE MEDICA…          63          87          86          57
+#&gt; 2 0446162697 ASSOCIATION OF …          59          85          83          63
+#&gt; 3 0547164295 BEAVER MEDICAL …          49          NA          75          44
+#&gt; 4 0749333730 CAPE PHYSICIANS…          67          84          85          65
+#&gt; 5 0840104360 ALLIANCE PHYSIC…          66          87          87          64
+#&gt; 6 0840109864 REX HOSPITAL INC          73          87          84          67
+#&gt; # … with 89 more rows, and 2 more variables: CAHPS_GRP_8 &lt;dbl&gt;,
+#&gt; #   CAHPS_GRP_12 &lt;dbl&gt;</pre>
 </div>
 <p>This gives us the output that we’re looking for.</p>
 </section>
@@ -826,7 +825,7 @@ Pragmatic computation</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">cms_patient_care |&gt; 
  filter(type == "observed") |&gt; 
-  ggplot(aes(score)) + 
+  ggplot(aes(x = score)) + 
  geom_histogram(binwidth = 2) + 
  facet_wrap(vars(measure_abbr))
 #&gt; Warning: Removed 1 rows containing non-finite values (`stat_bin()`).</pre>
@@ -842,7 +841,7 @@ Pragmatic computation</h2>
    names_from = measure_abbr,
    values_from = score
  ) |&gt; 
-  ggplot(aes(dyspnea_screening, dyspena_treatment)) + 
+  ggplot(aes(x = dyspnea_screening, y = dyspena_treatment)) + 
  geom_point() + 
  coord_equal()</pre>
 </div>