Re-render book for O'Reilly

2023-01-12 17:22:57 -06:00
parent 28671ed8bd
commit 360d65ae47
113 changed files with 4957 additions and 2997 deletions
@@ -60,7 +60,7 @@ df |&gt; mutate(
 <section id="writing-a-function" data-type="sect2">
 <h2>
 Writing a function</h2>
-<p>To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary. If we take the code above and pull it outside of <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> it’s a little easier to see the pattern because each repetition is now one line:</p>
+<p>To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary. If we take the code above and pull it outside of <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>, it’s a little easier to see the pattern because each repetition is now one line:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">(a - min(a, na.rm = TRUE)) / (max(a, na.rm = TRUE) - min(a, na.rm = TRUE))
 (b - min(b, na.rm = TRUE)) / (max(b, na.rm = TRUE) - min(b, na.rm = TRUE))
@@ -73,8 +73,8 @@ Writing a function</h2>
 </div>
 <p>To turn this into a function you need three things:</p>
 <ol type="1"><li><p>A <strong>name</strong>. Here we’ll use <code>rescale01</code> because this function rescales a vector to lie between 0 and 1.</p></li>
-<li><p>The <strong>arguments</strong>. The arguments are things that vary across calls and our analysis above tells us that have just one. We’ll call it <code>x</code> because this is the conventional name for a numeric vector.</p></li>
-<li><p>The <strong>body</strong>. The body is the code that repeated across all the calls.</p></li>
+<li><p>The <strong>arguments</strong>. The arguments are things that vary across calls and our analysis above tells us that we have just one. We’ll call it <code>x</code> because this is the conventional name for a numeric vector.</p></li>
+<li><p>The <strong>body</strong>. The body is the code that’s repeated across all the calls.</p></li>
 </ol><p>Then you create a function by following the template:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">name &lt;- function(arguments) {
@@ -117,7 +117,7 @@ rescale01(c(1, 2, 3, NA, 5))
 <section id="improving-our-function" data-type="sect2">
 <h2>
 Improving our function</h2>
-<p>You might notice <code>rescale01()</code> function does some unnecessary work — instead of computing <code><a href="https://rdrr.io/r/base/Extremes.html">min()</a></code> twice and <code><a href="https://rdrr.io/r/base/Extremes.html">max()</a></code> once we could instead compute both the minimum and maximum in one step with <code><a href="https://rdrr.io/r/base/range.html">range()</a></code>:</p>
+<p>You might notice that the <code>rescale01()</code> function does some unnecessary work — instead of computing <code><a href="https://rdrr.io/r/base/Extremes.html">min()</a></code> twice and <code><a href="https://rdrr.io/r/base/Extremes.html">max()</a></code> once we could instead compute both the minimum and maximum in one step with <code><a href="https://rdrr.io/r/base/range.html">range()</a></code>:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">rescale01 &lt;- function(x) {
  rng &lt;- range(x, na.rm = TRUE)
@@ -136,6 +136,7 @@ rescale01(x)
  rng &lt;- range(x, na.rm = TRUE, finite = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
 }
+
 rescale01(x)
 #&gt;  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
 #&gt;  [8] 0.7777778 0.8888889 1.0000000       Inf</pre>
@@ -146,14 +147,14 @@ rescale01(x)
 <section id="mutate-functions" data-type="sect2">
 <h2>
 Mutate functions</h2>
-<p>Now you’ve got the basic idea of functions, lets take a look a whole bunch of examples. We’ll start by looking at “mutate” functions, functions that work well like <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> because they return an output the same length as the input.</p>
-<p>Lets start with a simple variation of <code>rescale01()</code>. Maybe you want compute the Z-score, rescaling a vector to have to a mean of zero and a standard deviation of one:</p>
+<p>Now you’ve got the basic idea of functions, let’s take a look at a whole bunch of examples. We’ll start by looking at “mutate” functions, i.e. functions that work well inside of <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> because they return an output of the same length as the input.</p>
+<p>Let’s start with a simple variation of <code>rescale01()</code>. Maybe you want to compute the Z-score, rescaling a vector to have a mean of zero and a standard deviation of one:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">z_score &lt;- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
 }</pre>
 </div>
-<p>Or maybe you want to wrap up a straightforward <code><a href="https://dplyr.tidyverse.org/reference/case_when.html">case_when()</a></code> in order to give it a useful name. For example, this <code>clamp()</code> function ensures all values of a vector lie in between a minimum or a maximum:</p>
+<p>Or maybe you want to wrap up a straightforward <code><a href="https://dplyr.tidyverse.org/reference/case_when.html">case_when()</a></code> and give it a useful name. For example, this <code>clamp()</code> function ensures all values of a vector lie in between a minimum or a maximum:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">clamp &lt;- function(x, min, max) {
  case_when(
@@ -162,6 +163,7 @@ Mutate functions</h2>
    .default = x
  )
 }
+
 clamp(1:10, min = 3, max = 7)
 #&gt;  [1] 3 3 3 4 5 6 7 7 7 7</pre>
 </div>
@@ -174,15 +176,17 @@ clamp(1:10, min = 3, max = 7)
    .default = x
  )
 }
+
 na_outside(1:10, min = 3, max = 7)
 #&gt;  [1] NA NA  3  4  5  6  7 NA NA NA</pre>
 </div>
-<p>Of course functions don’t just need to work with numeric variables. You might want to extract out some repeated string manipulation. Maybe you need to make the first character upper case:</p>
+<p>Of course functions don’t just need to work with numeric variables. You might want to do some repeated string manipulation. Maybe you need to make the first character upper case:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">first_upper &lt;- function(x) {
  str_sub(x, 1, 1) &lt;- str_to_upper(str_sub(x, 1, 1))
  x
 }
+
 first_upper("hello")
 #&gt; [1] "Hello"</pre>
 </div>
@@ -198,12 +202,13 @@ clean_number &lt;- function(x) {
    as.numeric(x)
  if_else(is_pct, num / 100, num)
 }
+
 clean_number("$12,300")
 #&gt; [1] 12300
 clean_number("45%")
 #&gt; [1] 0.45</pre>
 </div>
-<p>Sometimes your functions will be highly specialized for one data analysis. For example, if you have a bunch of variables that record missing values as 997, 998, or 999, you might want to write a function to replace them with <code>NA</code>:</p>
+<p>Sometimes your functions will be highly specialized for one data analysis step. For example, if you have a bunch of variables that record missing values as 997, 998, or 999, you might want to write a function to replace them with <code>NA</code>:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">fix_na &lt;- function(x) {
  if_else(x %in% c(997, 998, 999), NA, x)
@@ -237,14 +242,16 @@ Summary functions</h2>
 <pre data-type="programlisting" data-code-language="r">commas &lt;- function(x) {
  str_flatten(x, collapse = ", ", last = " and ")
 }
+
 commas(c("cat", "dog", "pigeon"))
 #&gt; [1] "cat, dog and pigeon"</pre>
 </div>
-<p>Or you might wrap up a simple computation, like for the coefficient of variation, which divides standard deviation by the mean:</p>
+<p>Or you might wrap up a simple computation, like for the coefficient of variation, which divides the standard deviation by the mean:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">cv &lt;- function(x, na.rm = FALSE) {
  sd(x, na.rm = na.rm) / mean(x, na.rm = na.rm)
 }
+
 cv(runif(100, min = 0, max = 50))
 #&gt; [1] 0.5196276
 cv(runif(100, min = 0, max = 500))
@@ -318,42 +325,62 @@ Data frame functions</h1>
 <section id="indirection-and-tidy-evaluation" data-type="sect2">
 <h2>
 Indirection and tidy evaluation</h2>
-<p>When you start writing functions that use dplyr verbs you rapidly hit the problem of indirection. Let’s illustrate the problem with a very simple function: <code>pull_unique()</code>. The goal of this function is to <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> the unique (distinct) values of a variable:</p>
+<p>When you start writing functions that use dplyr verbs you rapidly hit the problem of indirection. Let’s illustrate the problem with a very simple function: <code>grouped_mean()</code>. The goal of this function is compute the mean of <code>mean_var</code> grouped by <code>group_var</code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">pull_unique &lt;- function(df, var) {
+<pre data-type="programlisting" data-code-language="r">grouped_mean &lt;- function(df, group_var, mean_var) {
  df |&gt; 
-    distinct(var) |&gt; 
-    pull(var)
+    group_by(group_var) |&gt; 
+    summarize(mean(mean_var))
 }</pre>
 </div>
 <p>If we try and use it, we get an error:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">diamonds |&gt; pull_unique(clarity)
-#&gt; Error in `distinct()` at ]8;line = 38:col = 2;file:///Users/hadleywickham/Documents/dplyr/dplyr/R/pull.Rdplyr/R/pull.R:38:2]8;;:
-#&gt; ! Must use existing variables.
-#&gt; ✖ `var` not found in `.data`.</pre>
+<pre data-type="programlisting" data-code-language="r">diamonds |&gt; grouped_mean(cut, carat)
+#&gt; Error in `group_by()`:
+#&gt; ! Must group by variables found in `.data`.
+#&gt; ✖ Column `group_var` is not found.</pre>
 </div>
-<p>To make the problem a bit more clear we can use a made up data frame:</p>
+<p>To make the problem a bit more clear, we can use a made up data frame:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(var = "var", x = "x", y = "y")
-df |&gt; pull_unique(x)
-#&gt; [1] "var"
-df |&gt; pull_unique(y)
-#&gt; [1] "var"</pre>
+<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(
+  mean_var = 1,
+  group_var = "g",
+  group = 1,
+  x = 10,
+  y = 100
+)
+
+df |&gt; grouped_mean(group, x)
+#&gt; # A tibble: 1 × 2
+#&gt;   group_var `mean(mean_var)`
+#&gt;   &lt;chr&gt;                &lt;dbl&gt;
+#&gt; 1 g                        1
+df |&gt; grouped_mean(group, y)
+#&gt; # A tibble: 1 × 2
+#&gt;   group_var `mean(mean_var)`
+#&gt;   &lt;chr&gt;                &lt;dbl&gt;
+#&gt; 1 g                        1</pre>
 </div>
-<p>Regardless of how we call <code>pull_unique()</code> it always does <code>df |&gt; distinct(var) |&gt; pull(var)</code>, instead of <code>df |&gt; distinct(x) |&gt; pull(x)</code> or <code>df |&gt; distinct(y) |&gt; pull(y)</code>. This is a problem of indirection, and it arises because dplyr uses <strong>tidy evaluation</strong> to allow you to refer to the names of variables inside your data frame without any special treatment.</p>
-<p>Tidy evaluation is great 95% of the time because it makes your data analyses very concise as you never have to say which data frame a variable comes from; it’s obvious from the context. The downside of tidy evaluation comes when we want to wrap up repeated tidyverse code into a function. Here we need some way to tell <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> not to treat <code>var</code> as the name of a variable, but instead look inside <code>var</code> for the variable we actually want to use.</p>
+<p>Regardless of how we call <code>grouped_mean()</code> it always does <code>df |&gt; group_by(group_var) |&gt; summarize(mean(mean_var))</code>, instead of <code>df |&gt; group_by(group) |&gt; summarize(mean(x))</code> or <code>df |&gt; group_by(group) |&gt; summarize(mean(y))</code>. This is a problem of indirection, and it arises because dplyr uses <strong>tidy evaluation</strong> to allow you to refer to the names of variables inside your data frame without any special treatment.</p>
+<p>Tidy evaluation is great 95% of the time because it makes your data analyses very concise as you never have to say which data frame a variable comes from; it’s obvious from the context. The downside of tidy evaluation comes when we want to wrap up repeated tidyverse code into a function. Here we need some way to tell <code>group_mean()</code> and <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> not to treat <code>group_var</code> and <code>mean_var</code> as the name of the variables, but instead look inside them for the variable we actually want to use.</p>
 <p>Tidy evaluation includes a solution to this problem called <strong>embracing</strong> 🤗. Embracing a variable means to wrap it in braces so (e.g.) <code>var</code> becomes <code>{{ var }}</code>. Embracing a variable tells dplyr to use the value stored inside the argument, not the argument as the literal variable name. One way to remember what’s happening is to think of <code>{{ }}</code> as looking down a tunnel — <code>{{ var }}</code> will make a dplyr function look inside of <code>var</code> rather than looking for a variable called <code>var</code>.</p>
-<p>So to make <code>pull_unique()</code> work we need to replace <code>var</code> with <code>{{ var }}</code>:</p>
+<p>So to make grouped_mean<code>()</code> work, we need to surround <code>group_var</code> and <code>mean_var()</code> with <code>{{ }}</code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">pull_unique &lt;- function(df, var) {
+<pre data-type="programlisting" data-code-language="r">grouped_mean &lt;- function(df, group_var, mean_var) {
  df |&gt; 
-    distinct({{ var }}) |&gt; 
-    pull({{ var }})
+    group_by({{ group_var }}) |&gt; 
+    summarize(mean({{ mean_var }}))
 }
-diamonds |&gt; pull_unique(clarity)
-#&gt; [1] SI2  SI1  VS1  VS2  VVS2 VVS1 I1   IF  
-#&gt; Levels: I1 &lt; SI2 &lt; SI1 &lt; VS2 &lt; VS1 &lt; VVS2 &lt; VVS1 &lt; IF</pre>
+
+diamonds |&gt; grouped_mean(cut, carat)
+#&gt; # A tibble: 5 × 2
+#&gt;   cut       `mean(carat)`
+#&gt;   &lt;ord&gt;             &lt;dbl&gt;
+#&gt; 1 Fair              1.05 
+#&gt; 2 Good              0.849
+#&gt; 3 Very Good         0.806
+#&gt; 4 Premium           0.892
+#&gt; 5 Ideal             0.703</pre>
 </div>
 <p>Success!</p>
 </section>
@@ -361,11 +388,11 @@ diamonds |&gt; pull_unique(clarity)
 <section id="sec-embracing" data-type="sect2">
 <h2>
 When to embrace?</h2>
-<p>So the key challenge in writing data frame functions is figuring out which arguments need to be embraced. Fortunately this is easy because you can look it up from the documentation 😄. There are two terms to look for in the docs which corresponding to the two most common sub-types of tidy evaluation:</p>
-<ul><li><p><strong>Data-masking</strong>: this is used in functions like <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> that compute with variables.</p></li>
-<li><p><strong>Tidy-selection</strong>: this is used for for functions like <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/relocate.html">relocate()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename()</a></code> that select variables.</p></li>
+<p>So the key challenge in writing data frame functions is figuring out which arguments need to be embraced. Fortunately, this is easy because you can look it up from the documentation 😄. There are two terms to look for in the docs which correspond to the two most common sub-types of tidy evaluation:</p>
+<ul><li><p><strong>Data-masking</strong>: this is used in functions like <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> that compute with variables.</p></li>
+<li><p><strong>Tidy-selection</strong>: this is used for functions like <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/relocate.html">relocate()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename()</a></code> that select variables.</p></li>
 </ul><p>Your intuition about which arguments use tidy evaluation should be good for many common functions — just think about whether you can compute (e.g. <code>x + 1</code>) or select (e.g. <code>a:x</code>).</p>
-<p>In the following sections we’ll explore the sorts of handy functions you might write once you understand embracing.</p>
+<p>In the following sections, we’ll explore the sorts of handy functions you might write once you understand embracing.</p>
 </section>

 <section id="common-use-cases" data-type="sect2">
@@ -374,7 +401,7 @@ Common use cases</h2>
 <p>If you commonly perform the same set of summaries when doing initial data exploration, you might consider wrapping them up in a helper function:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">summary6 &lt;- function(data, var) {
-  data |&gt; summarise(
+  data |&gt; summarize(
    min = min({{ var }}, na.rm = TRUE),
    mean = mean({{ var }}, na.rm = TRUE),
    median = median({{ var }}, na.rm = TRUE),
@@ -384,14 +411,15 @@ Common use cases</h2>
    .groups = "drop"
  )
 }
+
 diamonds |&gt; summary6(carat)
 #&gt; # A tibble: 1 × 6
 #&gt;     min  mean median   max     n n_miss
 #&gt;   &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;  &lt;int&gt;
 #&gt; 1   0.2 0.798    0.7  5.01 53940      0</pre>
 </div>
-<p>(Whenever you wrap <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> in a helper, we think it’s good practice to set <code>.groups = "drop"</code> to both avoid the message and leave the data in an ungrouped state.)</p>
-<p>The nice thing about this function is because it wraps <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> you can used it on grouped data:</p>
+<p>(Whenever you wrap <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> in a helper, we think it’s good practice to set <code>.groups = "drop"</code> to both avoid the message and leave the data in an ungrouped state.)</p>
+<p>The nice thing about this function is, because it wraps <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code>, you can use it on grouped data:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">diamonds |&gt; 
  group_by(cut) |&gt; 
@@ -405,7 +433,7 @@ diamonds |&gt; summary6(carat)
 #&gt; 4 Premium    0.2  0.892   0.86  4.01 13791      0
 #&gt; 5 Ideal      0.2  0.703   0.54  3.5  21551      0</pre>
 </div>
-<p>Because the arguments to summarize are data-masking that also means that the <code>var</code> argument to <code>summary6()</code> is data-masking. That means you can also summarize computed variables:</p>
+<p>Furthermore, since the arguments to summarize are data-masking also means that the <code>var</code> argument to <code>summary6()</code> is data-masking. That means you can also summarize computed variables:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">diamonds |&gt; 
  group_by(cut) |&gt; 
@@ -419,8 +447,8 @@ diamonds |&gt; summary6(carat)
 #&gt; 4 Premium   -0.699 -0.125  -0.0655 0.603 13791      0
 #&gt; 5 Ideal     -0.699 -0.225  -0.268  0.544 21551      0</pre>
 </div>
-<p>To summarize multiple variables you’ll need to wait until <a href="#sec-across" data-type="xref">#sec-across</a>, where you’ll learn how to use <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>.</p>
-<p>Another popular <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> helper function is a version of <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code> that also computes proportions:</p>
+<p>To summarize multiple variables, you’ll need to wait until <a href="#sec-across" data-type="xref">#sec-across</a>, where you’ll learn how to use <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>.</p>
+<p>Another popular <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> helper function is a version of <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code> that also computes proportions:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># https://twitter.com/Diabb6/status/1571635146658402309
 count_prop &lt;- function(df, var, sort = FALSE) {
@@ -428,6 +456,7 @@ count_prop &lt;- function(df, var, sort = FALSE) {
    count({{ var }}, sort = sort) |&gt;
    mutate(prop = n / sum(n))
 }
+
 diamonds |&gt; count_prop(clarity)
 #&gt; # A tibble: 8 × 3
 #&gt;   clarity     n   prop
@@ -447,26 +476,36 @@ diamonds |&gt; count_prop(clarity)
  df |&gt; 
    filter({{ condition }}) |&gt; 
    distinct({{ var }}) |&gt; 
-    arrange({{ var }}) |&gt; 
-    pull({{ var }})
+    arrange({{ var }})
 }

 # Find all the destinations in December
 flights |&gt; unique_where(month == 12, dest)
-#&gt;  [1] "ABQ" "ALB" "ATL" "AUS" "AVL" "BDL" "BGR" "BHM" "BNA" "BOS" "BQN" "BTV"
-#&gt; [13] "BUF" "BUR" "BWI" "BZN" "CAE" "CAK" "CHS" "CLE" "CLT" "CMH" "CVG" "DAY"
-#&gt; [25] "DCA" "DEN" "DFW" "DSM" "DTW" "EGE" "EYW" "FLL" "GRR" "GSO" "GSP" "HDN"
-#&gt; [37] "HNL" "HOU" "IAD" "IAH" "ILM" "IND" "JAC" "JAX" "LAS" "LAX" "LGB" "MCI"
-#&gt; [49] "MCO" "MDW" "MEM" "MHT" "MIA" "MKE" "MSN" "MSP" "MSY" "MTJ" "OAK" "OKC"
-#&gt; [61] "OMA" "ORD" "ORF" "PBI" "PDX" "PHL" "PHX" "PIT" "PSE" "PSP" "PVD" "PWM"
-#&gt; [73] "RDU" "RIC" "ROC" "RSW" "SAN" "SAT" "SAV" "SBN" "SDF" "SEA" "SFO" "SJC"
-#&gt; [85] "SJU" "SLC" "SMF" "SNA" "SRQ" "STL" "STT" "SYR" "TPA" "TUL" "TYS" "XNA"
+#&gt; # A tibble: 96 × 1
+#&gt;   dest 
+#&gt;   &lt;chr&gt;
+#&gt; 1 ABQ  
+#&gt; 2 ALB  
+#&gt; 3 ATL  
+#&gt; 4 AUS  
+#&gt; 5 AVL  
+#&gt; 6 BDL  
+#&gt; # … with 90 more rows
 # Which months did plane N14228 fly in?
 flights |&gt; unique_where(tailnum == "N14228", month)
-#&gt;  [1]  1  2  3  4  5  6  7  8  9 10 12</pre>
+#&gt; # A tibble: 11 × 1
+#&gt;   month
+#&gt;   &lt;int&gt;
+#&gt; 1     1
+#&gt; 2     2
+#&gt; 3     3
+#&gt; 4     4
+#&gt; 5     5
+#&gt; 6     6
+#&gt; # … with 5 more rows</pre>
 </div>
-<p>Here we embrace <code>condition</code> because it’s passed to <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> and <code>var</code> because its passed to <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code>.</p>
-<p>We’ve made all these examples take a data frame as the first argument, but if you’re working repeatedly with the same data, it can make sense to hardcode it. For example, the following function always works with the flights dataset and always selects <code>time_hour</code>, <code>carrier</code>, and <code>flight</code> since they form the compound primary key that allows you to identify a row.</p>
+<p>Here we embrace <code>condition</code> because it’s passed to <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> and <code>var</code> because it’s passed to <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code>.</p>
+<p>We’ve made all these examples to take a data frame as the first argument, but if you’re working repeatedly with the same data, it can make sense to hardcode it. For example, the following function always works with the flights dataset and always selects <code>time_hour</code>, <code>carrier</code>, and <code>flight</code> since they form the compound primary key that allows you to identify a row.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">flights_sub &lt;- function(rows, cols) {
  flights |&gt; 
@@ -476,43 +515,45 @@ flights |&gt; unique_where(tailnum == "N14228", month)

 flights_sub(dest == "IAH", contains("time"))
 #&gt; # A tibble: 7,198 × 8
-#&gt;   time_hour           carrier flight dep_time sched…¹ arr_t…² sched…³ air_t…⁴
-#&gt;   &lt;dttm&gt;              &lt;chr&gt;    &lt;int&gt;    &lt;int&gt;   &lt;int&gt;   &lt;int&gt;   &lt;int&gt;   &lt;dbl&gt;
-#&gt; 1 2013-01-01 05:00:00 UA        1545      517     515     830     819     227
-#&gt; 2 2013-01-01 05:00:00 UA        1714      533     529     850     830     227
-#&gt; 3 2013-01-01 06:00:00 UA         496      623     627     933     932     229
-#&gt; 4 2013-01-01 07:00:00 UA         473      728     732    1041    1038     238
-#&gt; 5 2013-01-01 07:00:00 UA        1479      739     739    1104    1038     249
-#&gt; 6 2013-01-01 09:00:00 UA        1220      908     908    1228    1219     233
-#&gt; # … with 7,192 more rows, and abbreviated variable names ¹sched_dep_time,
-#&gt; #   ²arr_time, ³sched_arr_time, ⁴air_time</pre>
+#&gt;   time_hour           carrier flight dep_time sched_dep_time arr_time
+#&gt;   &lt;dttm&gt;              &lt;chr&gt;    &lt;int&gt;    &lt;int&gt;          &lt;int&gt;    &lt;int&gt;
+#&gt; 1 2013-01-01 05:00:00 UA        1545      517            515      830
+#&gt; 2 2013-01-01 05:00:00 UA        1714      533            529      850
+#&gt; 3 2013-01-01 06:00:00 UA         496      623            627      933
+#&gt; 4 2013-01-01 07:00:00 UA         473      728            732     1041
+#&gt; 5 2013-01-01 07:00:00 UA        1479      739            739     1104
+#&gt; 6 2013-01-01 09:00:00 UA        1220      908            908     1228
+#&gt; # … with 7,192 more rows, and 2 more variables: sched_arr_time &lt;int&gt;,
+#&gt; #   air_time &lt;dbl&gt;</pre>
 </div>
 </section>

-<section id="data-masking-vs-tidy-selection" data-type="sect2">
+<section id="data-masking-vs.-tidy-selection" data-type="sect2">
 <h2>
-Data-masking vs tidy-selection</h2>
+Data-masking vs. tidy-selection</h2>
 <p>Sometimes you want to select variables inside a function that uses data-masking. For example, imagine you want to write a <code>count_missing()</code> that counts the number of missing observations in rows. You might try writing something like:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">count_missing &lt;- function(df, group_vars, x_var) {
  df |&gt; 
    group_by({{ group_vars }}) |&gt; 
-    summarise(n_miss = sum(is.na({{ x_var }})))
+    summarize(n_miss = sum(is.na({{ x_var }})))
 }
+
 flights |&gt; 
  count_missing(c(year, month, day), dep_time)
-#&gt; Error in `group_by()` at ]8;line = 127:col = 2;file:///Users/hadleywickham/Documents/dplyr/dplyr/R/summarise.Rdplyr/R/summarise.R:127:2]8;;:
-#&gt; ℹ In argument: `..1 = c(year, month, day)`.
+#&gt; Error in `group_by()`:
+#&gt; ℹ In argument: `c(year, month, day)`.
 #&gt; Caused by error:
-#&gt; ! `..1` must be size 336776 or 1, not 1010328.</pre>
+#&gt; ! `c(year, month, day)` must be size 336776 or 1, not 1010328.</pre>
 </div>
-<p>This doesn’t work because <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code> uses data-masking, not tidy-selection. We can work around that problem by using the handy <code><a href="https://dplyr.tidyverse.org/reference/pick.html">pick()</a></code> which allows you to use use tidy-selection inside data-masking functions:</p>
+<p>This doesn’t work because <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code> uses data-masking, not tidy-selection. We can work around that problem by using the handy <code><a href="https://dplyr.tidyverse.org/reference/pick.html">pick()</a></code> function, which allows you to use tidy-selection inside data-masking functions:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">count_missing &lt;- function(df, group_vars, x_var) {
  df |&gt; 
    group_by(pick({{ group_vars }})) |&gt; 
-    summarise(n_miss = sum(is.na({{ x_var }})))
+    summarize(n_miss = sum(is.na({{ x_var }})))
 }
+
 flights |&gt; 
  count_missing(c(year, month, day), dep_time)
 #&gt; `summarise()` has grouped output by 'year', 'month'. You can override using
@@ -542,6 +583,7 @@ count_wide &lt;- function(data, rows, cols) {
      values_fill = 0
    )
 }
+
 diamonds |&gt; count_wide(clarity, cut)
 #&gt; # A tibble: 8 × 6
 #&gt;   clarity  Fair  Good `Very Good` Premium Ideal
@@ -572,9 +614,9 @@ diamonds |&gt; count_wide(c(clarity, color), cut)
 <h2>
 Exercises</h2>
 <ol type="1"><li>
-<p>Using the datasets from nyclights13, write functions that:</p>
+<p>Using the datasets from nycflights13, write a function that:</p>
 <ol type="1"><li>
-<p>Find all flights that were cancelled (i.e. <code>is.na(arr_time)</code>) or delayed by more than an hour.</p>
+<p>Finds all flights that were cancelled (i.e. <code>is.na(arr_time)</code>) or delayed by more than an hour.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">flights |&gt; filter_severe()</pre>
 </div>
@@ -582,7 +624,7 @@ Exercises</h2>
 <li>
 <p>Counts the number of cancelled flights and the number of flights delayed by more than an hour.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">flights |&gt; group_by(dest) |&gt; summarise_severe()</pre>
+<pre data-type="programlisting" data-code-language="r">flights |&gt; group_by(dest) |&gt; summarize_severe()</pre>
 </div>
 </li>
 <li>
@@ -592,19 +634,19 @@ Exercises</h2>
 </div>
 </li>
 <li>
-<p>Summarizes the weather to compute the minum, mean, and maximum, of a user supplied variable:</p>
+<p>Summarizes the weather to compute the minimum, mean, and maximum, of a user supplied variable:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">weather |&gt; summarise_weather(temp)</pre>
+<pre data-type="programlisting" data-code-language="r">weather |&gt; summarize_weather(temp)</pre>
 </div>
 </li>
 <li>
-<p>Converts the user supplied variable that uses clock time (e.g. <code>dep_time</code>, <code>arr_time</code>, etc) into a decimal time (i.e. hours + minutes / 60).</p>
+<p>Converts the user supplied variable that uses clock time (e.g. <code>dep_time</code>, <code>arr_time</code>, etc.) into a decimal time (i.e. hours + (minutes / 60)).</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">weather |&gt; standardise_time(sched_dep_time)</pre>
 </div>
 </li>
 </ol></li>
-<li><p>For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename_with()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/slice.html">slice_min()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/slice.html">slice_sample()</a></code>.</p></li>
+<li><p>For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-selection: <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename_with()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/slice.html">slice_min()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/slice.html">slice_sample()</a></code>.</p></li>
 <li>
 <p>Generalize the following function so that you can supply any number of variables to count.</p>
 <div class="cell">
@@ -621,21 +663,21 @@ Exercises</h2>
 <section id="plot-functions" data-type="sect1">
 <h1>
 Plot functions</h1>
-<p>Instead of returning a data frame, you might want to return a plot. Fortunately you can use the same techniques with ggplot2, because <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> is a data-masking function. For example, imagine that you’re making a lot of histograms:</p>
+<p>Instead of returning a data frame, you might want to return a plot. Fortunately, you can use the same techniques with ggplot2, because <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> is a data-masking function. For example, imagine that you’re making a lot of histograms:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">diamonds |&gt; 
-  ggplot(aes(carat)) +
+  ggplot(aes(x = carat)) +
  geom_histogram(binwidth = 0.1)

 diamonds |&gt; 
-  ggplot(aes(carat)) +
+  ggplot(aes(x = carat)) +
  geom_histogram(binwidth = 0.05)</pre>
 </div>
-<p>Wouldn’t it be nice if you could wrap this up into a histogram function? This is easy as once you know that <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> is a data-masking function so that you need to embrace:</p>
+<p>Wouldn’t it be nice if you could wrap this up into a histogram function? This is easy as pie once you know that <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> is a data-masking function and you need to embrace:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">histogram &lt;- function(df, var, binwidth = NULL) {
  df |&gt; 
-    ggplot(aes({{ var }})) + 
+    ggplot(aes(x = {{ var }})) + 
    geom_histogram(binwidth = binwidth)
 }

@@ -644,7 +686,7 @@ diamonds |&gt; histogram(carat, 0.1)</pre>
 <p><img src="functions_files/figure-html/unnamed-chunk-46-1.png" class="img-fluid" width="576"/></p>
 </div>
 </div>
-<p>Note that <code>histogram()</code> returns a ggplot2 plot, so that you can still add on additional components if you want. Just remember to switch from <code>|&gt;</code> to <code>+</code>:</p>
+<p>Note that <code>histogram()</code> returns a ggplot2 plot, meaning you can still add on additional components if you want. Just remember to switch from <code>|&gt;</code> to <code>+</code>:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">diamonds |&gt; 
  histogram(carat, 0.1) +
@@ -660,10 +702,9 @@ More variables</h2>
 <p>It’s straightforward to add more variables to the mix. For example, maybe you want an easy way to eyeball whether or not a data set is linear by overlaying a smooth line and a straight line:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># https://twitter.com/tyler_js_smith/status/1574377116988104704
-
 linearity_check &lt;- function(df, x, y) {
  df |&gt;
-    ggplot(aes({{ x }}, {{ y }})) +
+    ggplot(aes(x = {{ x }}, y = {{ y }})) +
    geom_point() +
    geom_smooth(method = "loess", color = "red", se = FALSE) +
    geom_smooth(method = "lm", color = "blue", se = FALSE) 
@@ -683,13 +724,14 @@ starwars |&gt;
 <pre data-type="programlisting" data-code-language="r"># https://twitter.com/ppaxisa/status/1574398423175921665
 hex_plot &lt;- function(df, x, y, z, bins = 20, fun = "mean") {
  df |&gt; 
-    ggplot(aes({{ x }}, {{ y }}, z = {{ z }})) + 
+    ggplot(aes(x = {{ x }}, y = {{ y }}, z = {{ z }})) + 
    stat_summary_hex(
-      aes(colour = after_scale(fill)), # make border same colour as fill
+      aes(color = after_scale(fill)), # make border same color as fill
      bins = bins, 
      fun = fun,
    )
 }
+
 diamonds |&gt; hex_plot(carat, price, depth)</pre>
 <div class="cell-output-display">
 <p><img src="functions_files/figure-html/unnamed-chunk-49-1.png" class="img-fluid" width="576"/></p>
@@ -708,17 +750,19 @@ Combining with dplyr</h2>
    ggplot(aes(y = {{ var }})) + 
    geom_bar()
 }
+
 diamonds |&gt; sorted_bars(cut)</pre>
 <div class="cell-output-display">
 <p><img src="functions_files/figure-html/unnamed-chunk-50-1.png" class="img-fluid" width="576"/></p>
 </div>
 </div>
-<p>Or you could maybe you want to make it easy to draw a bar plot just for a subset of the data:</p>
+<p>We have to use a new operator here, <code>:=</code>, because we are generating the variable name based on user-supplied data. Variable names go on the left hand side of <code>=</code>, but R’s syntax doesn’t allow anything to the left of <code>=</code> except for a single literal name. To work around this problem, we use the special operator <code>:=</code> which tidy evaluation treats in exactly the same way as <code>=</code>.</p>
+<p>Or maybe you want to make it easy to draw a bar plot just for a subset of the data:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">conditional_bars &lt;- function(df, condition, var) {
  df |&gt; 
    filter({{ condition }}) |&gt; 
-    ggplot(aes({{ var }})) + 
+    ggplot(aes(x = {{ var }})) + 
    geom_bar()
 }

@@ -727,17 +771,16 @@ diamonds |&gt; conditional_bars(cut == "Good", clarity)</pre>
 <p><img src="functions_files/figure-html/unnamed-chunk-51-1.png" class="img-fluid" width="576"/></p>
 </div>
 </div>
-<p>You can also get creative and display data summaries in other way. For example, this code uses the axis labels to display the highest value. As you learn more about ggplot2, the power of your functions will continue to increase.</p>
+<p>You can also get creative and display data summaries in other ways. For example, this code uses the axis labels to display the highest value. As you learn more about ggplot2, the power of your functions will continue to increase.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># https://gist.github.com/GShotwell/b19ef520b6d56f61a830fabb3454965b
-
 fancy_ts &lt;- function(df, val, group) {
  labs &lt;- df |&gt; 
-    group_by({{group}}) |&gt; 
-    summarize(breaks = max({{val}}))
+    group_by({{ group }}) |&gt; 
+    summarize(breaks = max({{ val }}))
  
  df |&gt; 
-    ggplot(aes(date, {{val}}, group = {{group}}, color = {{group}})) +
+    ggplot(aes(x = date, y = {{ val }}, group = {{ group }}, color = {{ group }})) +
    geom_path() +
    scale_y_continuous(
      breaks = labs$breaks, 
@@ -753,6 +796,7 @@ df &lt;- tibble(
  dist4 = sort(rnorm(50, 15, 1)),
  date = seq.Date(as.Date("2022-01-01"), as.Date("2022-04-10"), by = "2 days")
 )
+
 df &lt;- pivot_longer(df, cols = -date, names_to = "dist_name", values_to = "value")

 fancy_ts(df, value, dist_name)</pre>
@@ -766,26 +810,26 @@ fancy_ts(df, value, dist_name)</pre>
 <section id="faceting" data-type="sect2">
 <h2>
 Faceting</h2>
-<p>Unfortunately programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work. so you have to learn a new syntax. When programming with facets, instead of writing <code>~ x</code>, you need to write <code>vars(x)</code> and instead of <code>~ x + y</code> you need to write <code>vars(x, y)</code>. The only advantage of this syntax is that <code><a href="https://ggplot2.tidyverse.org/reference/vars.html">vars()</a></code> uses tidy evaluation so you can embrace within it:</p>
+<p>Unfortunately, programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work. So you have to learn a new syntax. When programming with facets, instead of writing <code>~ x</code>, you need to write <code>vars(x)</code> and instead of <code>~ x + y</code> you need to write <code>vars(x, y)</code>. The only advantage of this syntax is that <code><a href="https://ggplot2.tidyverse.org/reference/vars.html">vars()</a></code> uses tidy evaluation so you can embrace within it:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># https://twitter.com/sharoz/status/1574376332821204999
-
 foo &lt;- function(x) {
-  ggplot(mtcars, aes(mpg, disp)) +
+  ggplot(mtcars, aes(x = mpg, y = disp)) +
    geom_point() +
    facet_wrap(vars({{ x }}))
 }
+
 foo(cyl)</pre>
 <div class="cell-output-display">
 <p><img src="functions_files/figure-html/unnamed-chunk-53-1.png" class="img-fluid" width="576"/></p>
 </div>
 </div>
-<p>As with data frame functions, it can be useful to make your plotting functions tightly coupled to a specific dataset, or even a specific variable. For example, the following function makes it particularly easy to interactively explore the conditional distribution <code>bill_length_mm</code> from palmerpenguins dataset.</p>
+<p>As with data frame functions, it can be useful to make your plotting functions tightly coupled to a specific dataset, or even a specific variable. For example, the following function makes it particularly easy to interactively explore the conditional distribution of <code>carat</code> from the diamonds dataset.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># https://twitter.com/yutannihilat_en/status/1574387230025875457
-density &lt;- function(colour, facets, binwidth = 0.1) {
+density &lt;- function(color, facets, binwidth = 0.1) {
  diamonds |&gt; 
-    ggplot(aes(carat, after_stat(density), colour = {{ colour }})) +
+    ggplot(aes(x = carat, y = after_stat(density), color = {{ color }})) +
    geom_freqpoly(binwidth = binwidth) +
    facet_wrap(vars({{ facets }}))
 }
@@ -812,18 +856,18 @@ Labeling</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">histogram &lt;- function(df, var, binwidth = NULL) {
  df |&gt; 
-    ggplot(aes({{ var }})) + 
+    ggplot(aes(x = {{ var }})) + 
    geom_histogram(binwidth = binwidth)
 }</pre>
 </div>
-<p>Wouldn’t it be nice if we could label the output with the variable and the bin width that was used? To do so, we’re going to have to go under the covers of tidy evaluation and use a function from package we haven’t talked about before: rlang. rlang is a low-level package that’s used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools).</p>
-<p>To solve the labeling problem we can use <code><a href="https://rlang.r-lib.org/reference/englue.html">rlang::englue()</a></code>. This works similarly to <code><a href="https://stringr.tidyverse.org/reference/str_glue.html">str_glue()</a></code>, so any value wrapped in <code><a href="https://rdrr.io/r/base/Paren.html">{ }</a></code> will be inserted into the string. But it also understands <code>{{ }}</code>, which automatically insert the appropriate variable name:</p>
+<p>Wouldn’t it be nice if we could label the output with the variable and the bin width that was used? To do so, we’re going to have to go under the covers of tidy evaluation and use a function from the package we haven’t talked about yet: rlang. rlang is a low-level package that’s used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools).</p>
+<p>To solve the labeling problem we can use <code><a href="https://rlang.r-lib.org/reference/englue.html">rlang::englue()</a></code>. This works similarly to <code><a href="https://stringr.tidyverse.org/reference/str_glue.html">str_glue()</a></code>, so any value wrapped in <code><a href="https://rdrr.io/r/base/Paren.html">{ }</a></code> will be inserted into the string. But it also understands <code>{{ }}</code>, which automatically inserts the appropriate variable name:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">histogram &lt;- function(df, var, binwidth) {
  label &lt;- rlang::englue("A histogram of {{var}} with binwidth {binwidth}")
  
  df |&gt; 
-    ggplot(aes({{ var }})) + 
+    ggplot(aes(x = {{ var }})) + 
    geom_histogram(binwidth = binwidth) + 
    labs(title = label)
 }
@@ -833,17 +877,16 @@ diamonds |&gt; histogram(carat, 0.1)</pre>
 <p><img src="functions_files/figure-html/unnamed-chunk-56-1.png" class="img-fluid" width="576"/></p>
 </div>
 </div>
-<p>You can use the same approach any other place that you might supply a string in a ggplot2 plot.</p>
+<p>You can use the same approach in any other place where you want to supply a string in a ggplot2 plot.</p>
 </section>

 <section id="exercises-2" data-type="sect2">
 <h2>
 Exercises</h2>
-<ol type="1"><li>Build up a rich plotting function by incrementally implementing each of the steps below.
+<p>Build up a rich plotting function by incrementally implementing each of the steps below:</p>
 <ol type="1"><li><p>Draw a scatterplot given dataset and <code>x</code> and <code>y</code> variables.</p></li>
 <li><p>Add a line of best fit (i.e. a linear model with no standard errors).</p></li>
 <li><p>Add a title.</p></li>
-</ol></li>
 </ol></section>
 </section>

@@ -866,21 +909,20 @@ collapse_years()</pre>
 <p>R also doesn’t care about how you use white space in your functions but future readers will. Continue to follow the rules from <a href="#chp-workflow-style" data-type="xref">#chp-workflow-style</a>. Additionally, <code>function()</code> should always be followed by squiggly brackets (<code><a href="https://rdrr.io/r/base/Paren.html">{}</a></code>), and the contents should be indented by an additional two spaces. This makes it easier to see the hierarchy in your code by skimming the left-hand margin.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r"># missing extra two spaces
-pull_unique &lt;- function(df, var) {
-df |&gt; 
-  distinct({{ var }}) |&gt; 
-  pull({{ var }})
+density &lt;- function(color, facets, binwidth = 0.1) {
+diamonds |&gt; 
+  ggplot(aes(x = carat, y = after_stat(density), color = {{ color }})) +
+  geom_freqpoly(binwidth = binwidth) +
+  facet_wrap(vars({{ facets }}))
 }

 # Pipe indented incorrectly
-pull_unique &lt;- function(df, var) {
-  df |&gt; 
-  distinct({{ var }}) |&gt; 
-  pull({{ var }})
-}
-
-# Missing {} and all one line
-pull_unique &lt;- function(df, var) df |&gt; distinct({{ var }}) |&gt; pull({{ var }})</pre>
+density &lt;- function(color, facets, binwidth = 0.1) {
+  diamonds |&gt; 
+  ggplot(aes(x = carat, y = after_stat(density), color = {{ color }})) +
+  geom_freqpoly(binwidth = binwidth) +
+  facet_wrap(vars({{ facets }}))
+}</pre>
 </div>
 <p>As you can see we recommend putting extra spaces inside of <code>{{ }}</code>. This makes it very obvious that something unusual is happening.</p>

@@ -893,20 +935,21 @@ Exercises</h2>
 <pre data-type="programlisting" data-code-language="r">f1 &lt;- function(string, prefix) {
  substr(string, 1, nchar(prefix)) == prefix
 }
+
 f3 &lt;- function(x, y) {
  rep(y, length.out = length(x))
 }</pre>
 </div>
 </li>
 <li><p>Take a function that you’ve written recently and spend 5 minutes brainstorming a better name for it and its arguments.</p></li>
-<li><p>Make a case for why <code>norm_r()</code>, <code>norm_d()</code> etc would be better than <code><a href="https://rdrr.io/r/stats/Normal.html">rnorm()</a></code>, <code><a href="https://rdrr.io/r/stats/Normal.html">dnorm()</a></code>. Make a case for the opposite.</p></li>
+<li><p>Make a case for why <code>norm_r()</code>, <code>norm_d()</code> etc. would be better than <code><a href="https://rdrr.io/r/stats/Normal.html">rnorm()</a></code>, <code><a href="https://rdrr.io/r/stats/Normal.html">dnorm()</a></code>. Make a case for the opposite.</p></li>
 </ol></section>
 </section>

 <section id="summary" data-type="sect1">
 <h1>
 Summary</h1>
-<p>In this chapter you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot. Along the way your saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.</p>
+<p>In this chapter, you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot. Along the way you saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.</p>
 <p>We have only shown you the bare minimum to get started with functions and there’s much more to learn. A few places to learn more are:</p>
 <ul><li>To learn more about programming with tidy evaluation, see useful recipes in <a href="https://dplyr.tidyverse.org/articles/programming.html">programming with dplyr</a> and <a href="https://tidyr.tidyverse.org/articles/programming.html">programming with tidyr</a> and learn more about the theory in <a href="https://rlang.r-lib.org/reference/topic-data-mask.html">What is data-masking and why do I need {{?</a>.</li>
 <li>To learn more about reducing duplication in your ggplot2 code, read the <a href="https://ggplot2-book.org/programming.html" class="uri">Programming with ggplot2</a> chapter of the ggplot2 book.</li>