Don't transform non-crossref links

This commit is contained in:
Hadley Wickham
2022-11-18 10:30:32 -06:00
parent 4caea5281b
commit 78a1c12fe7
32 changed files with 693 additions and 693 deletions

View File

@@ -23,7 +23,7 @@ Introduction</h1>
<ul><li>Vector functions take one or more vectors as input and return a vector as output.</li>
<li>Data frame functions take a data frame as input and return a data frame as output.</li>
<li>Plot functions that take a data frame as input and return a plot as output.</li>
</ul><p>Each of these sections include many examples to help you generalize the patterns that you see. These examples wouldnt be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations. You might also want to read the original motivating tweets for <a href="#chp-https://twitter.com/hadleywickham/status/1571603361350164486" data-type="xref">#chp-https://twitter.com/hadleywickham/status/1571603361350164486</a> and <a href="#chp-https://twitter.com/hadleywickham/status/1574373127349575680" data-type="xref">#chp-https://twitter.com/hadleywickham/status/1574373127349575680</a> to see even more functions.</p>
</ul><p>Each of these sections include many examples to help you generalize the patterns that you see. These examples wouldnt be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations. You might also want to read the original motivating tweets for <a href="https://twitter.com/hadleywickham/status/1571603361350164486">general functions</a> and <a href="https://twitter.com/hadleywickham/status/1574373127349575680">plotting functions</a> to see even more functions.</p>
<section id="prerequisites" data-type="sect2">
<h2>
@@ -72,7 +72,7 @@ df |&gt; mutate(
<section id="writing-a-function" data-type="sect2">
<h2>
Writing a function</h2>
<p>To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary. If we take the code above and pull it outside of <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> its a little easier to see the pattern because each repetition is now one line:</p>
<p>To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary. If we take the code above and pull it outside of <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> its a little easier to see the pattern because each repetition is now one line:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">(a - min(a, na.rm = TRUE)) / (max(a, na.rm = TRUE) - min(a, na.rm = TRUE))
(b - min(b, na.rm = TRUE)) / (max(b, na.rm = TRUE) - min(b, na.rm = TRUE))
@@ -106,7 +106,7 @@ Writing a function</h2>
rescale01(c(1, 2, 3, NA, 5))
#&gt; [1] 0.00 0.25 0.50 NA 1.00</pre>
</div>
<p>Then you can rewrite the call to <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> as:</p>
<p>Then you can rewrite the call to <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> as:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df |&gt; mutate(
a = rescale01(a),
@@ -123,13 +123,13 @@ rescale01(c(1, 2, 3, NA, 5))
#&gt; 4 0.795 0.531 0 1
#&gt; 5 1 0.518 0.580 0.394</pre>
</div>
<p>(In <a href="#chp-iteration" data-type="xref">#chp-iteration</a>, youll learn how to use <code><a href="#chp-https://dplyr.tidyverse.org/reference/across" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/across</a></code> to reduce the duplication even further so all you need is <code>df |&gt; mutate(across(a:d, rescale01))</code>).</p>
<p>(In <a href="#chp-iteration" data-type="xref">#chp-iteration</a>, youll learn how to use <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code> to reduce the duplication even further so all you need is <code>df |&gt; mutate(across(a:d, rescale01))</code>).</p>
</section>
<section id="improving-our-function" data-type="sect2">
<h2>
Improving our function</h2>
<p>You might notice <code>rescale01()</code> function does some unnecessary work — instead of computing <code><a href="#chp-https://rdrr.io/r/base/Extremes" data-type="xref">#chp-https://rdrr.io/r/base/Extremes</a></code> twice and <code><a href="#chp-https://rdrr.io/r/base/Extremes" data-type="xref">#chp-https://rdrr.io/r/base/Extremes</a></code> once we could instead compute both the minimum and maximum in one step with <code><a href="#chp-https://rdrr.io/r/base/range" data-type="xref">#chp-https://rdrr.io/r/base/range</a></code>:</p>
<p>You might notice <code>rescale01()</code> function does some unnecessary work — instead of computing <code><a href="https://rdrr.io/r/base/Extremes.html">min()</a></code> twice and <code><a href="https://rdrr.io/r/base/Extremes.html">max()</a></code> once we could instead compute both the minimum and maximum in one step with <code><a href="https://rdrr.io/r/base/range.html">range()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">rescale01 &lt;- function(x) {
rng &lt;- range(x, na.rm = TRUE)
@@ -142,7 +142,7 @@ Improving our function</h2>
rescale01(x)
#&gt; [1] 0 0 0 0 0 0 0 0 0 0 NaN</pre>
</div>
<p>That result is not particularly useful so we could ask <code><a href="#chp-https://rdrr.io/r/base/range" data-type="xref">#chp-https://rdrr.io/r/base/range</a></code> to ignore infinite values:</p>
<p>That result is not particularly useful so we could ask <code><a href="https://rdrr.io/r/base/range.html">range()</a></code> to ignore infinite values:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">rescale01 &lt;- function(x) {
rng &lt;- range(x, na.rm = TRUE, finite = TRUE)
@@ -158,14 +158,14 @@ rescale01(x)
<section id="mutate-functions" data-type="sect2">
<h2>
Mutate functions</h2>
<p>Now youve got the basic idea of functions, lets take a look a whole bunch of examples. Well start by looking at “mutate” functions, functions that work well like <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> and <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> because they return an output the same length as the input.</p>
<p>Now youve got the basic idea of functions, lets take a look a whole bunch of examples. Well start by looking at “mutate” functions, functions that work well like <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> because they return an output the same length as the input.</p>
<p>Lets start with a simple variation of <code>rescale01()</code>. Maybe you want compute the Z-score, rescaling a vector to have to a mean of zero and a standard deviation of one:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">z_score &lt;- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}</pre>
</div>
<p>Or maybe you want to wrap up a straightforward <code><a href="#chp-https://dplyr.tidyverse.org/reference/case_when" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/case_when</a></code> in order to give it a useful name. For example, this <code>clamp()</code> function ensures all values of a vector lie in between a minimum or a maximum:</p>
<p>Or maybe you want to wrap up a straightforward <code><a href="https://dplyr.tidyverse.org/reference/case_when.html">case_when()</a></code> in order to give it a useful name. For example, this <code>clamp()</code> function ensures all values of a vector lie in between a minimum or a maximum:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">clamp &lt;- function(x, min, max) {
case_when(
@@ -244,7 +244,7 @@ haversine &lt;- function(long1, lat1, long2, lat2, round = 3) {
<section id="summary-functions" data-type="sect2">
<h2>
Summary functions</h2>
<p>Another important family of vector functions is summary functions, functions that return a single value for use in <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code>. Sometimes this can just be a matter of setting a default argument or two:</p>
<p>Another important family of vector functions is summary functions, functions that return a single value for use in <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code>. Sometimes this can just be a matter of setting a default argument or two:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">commas &lt;- function(x) {
str_flatten(x, collapse = ", ", last = " and ")
@@ -332,7 +332,7 @@ Data frame functions</h1>
<section id="indirection-and-tidy-evaluation" data-type="sect2">
<h2>
Indirection and tidy evaluation</h2>
<p>When you start writing functions that use dplyr verbs you rapidly hit the problem of indirection. Lets illustrate the problem with a very simple function: <code>pull_unique()</code>. The goal of this function is to <code><a href="#chp-https://dplyr.tidyverse.org/reference/pull" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/pull</a></code> the unique (distinct) values of a variable:</p>
<p>When you start writing functions that use dplyr verbs you rapidly hit the problem of indirection. Lets illustrate the problem with a very simple function: <code>pull_unique()</code>. The goal of this function is to <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> the unique (distinct) values of a variable:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">pull_unique &lt;- function(df, var) {
df |&gt;
@@ -356,7 +356,7 @@ df |&gt; pull_unique(y)
#&gt; [1] "var"</pre>
</div>
<p>Regardless of how we call <code>pull_unique()</code> it always does <code>df |&gt; distinct(var) |&gt; pull(var)</code>, instead of <code>df |&gt; distinct(x) |&gt; pull(x)</code> or <code>df |&gt; distinct(y) |&gt; pull(y)</code>. This is a problem of indirection, and it arises because dplyr uses <strong>tidy evaluation</strong> to allow you to refer to the names of variables inside your data frame without any special treatment.</p>
<p>Tidy evaluation is great 95% of the time because it makes your data analyses very concise as you never have to say which data frame a variable comes from; its obvious from the context. The downside of tidy evaluation comes when we want to wrap up repeated tidyverse code into a function. Here we need some way to tell <code><a href="#chp-https://dplyr.tidyverse.org/reference/distinct" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/distinct</a></code> and <code><a href="#chp-https://dplyr.tidyverse.org/reference/pull" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/pull</a></code> not to treat <code>var</code> as the name of a variable, but instead look inside <code>var</code> for the variable we actually want to use.</p>
<p>Tidy evaluation is great 95% of the time because it makes your data analyses very concise as you never have to say which data frame a variable comes from; its obvious from the context. The downside of tidy evaluation comes when we want to wrap up repeated tidyverse code into a function. Here we need some way to tell <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> not to treat <code>var</code> as the name of a variable, but instead look inside <code>var</code> for the variable we actually want to use.</p>
<p>Tidy evaluation includes a solution to this problem called <strong>embracing</strong> 🤗. Embracing a variable means to wrap it in braces so (e.g.) <code>var</code> becomes <code>{{ var }}</code>. Embracing a variable tells dplyr to use the value stored inside the argument, not the argument as the literal variable name. One way to remember whats happening is to think of <code>{{ }}</code> as looking down a tunnel — <code>{{ var }}</code> will make a dplyr function look inside of <code>var</code> rather than looking for a variable called <code>var</code>.</p>
<p>So to make <code>pull_unique()</code> work we need to replace <code>var</code> with <code>{{ var }}</code>:</p>
<div class="cell">
@@ -376,8 +376,8 @@ diamonds |&gt; pull_unique(clarity)
<h2>
When to embrace?</h2>
<p>So the key challenge in writing data frame functions is figuring out which arguments need to be embraced. Fortunately this is easy because you can look it up from the documentation 😄. There are two terms to look for in the docs which corresponding to the two most common sub-types of tidy evaluation:</p>
<ul><li><p><strong>Data-masking</strong>: this is used in functions like <code><a href="#chp-https://dplyr.tidyverse.org/reference/arrange" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/arrange</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code>, and <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code> that compute with variables.</p></li>
<li><p><strong>Tidy-selection</strong>: this is used for for functions like <code><a href="#chp-https://dplyr.tidyverse.org/reference/select" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/select</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/relocate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/relocate</a></code>, and <code><a href="#chp-https://dplyr.tidyverse.org/reference/rename" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/rename</a></code> that select variables.</p></li>
<ul><li><p><strong>Data-masking</strong>: this is used in functions like <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> that compute with variables.</p></li>
<li><p><strong>Tidy-selection</strong>: this is used for for functions like <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/relocate.html">relocate()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename()</a></code> that select variables.</p></li>
</ul><p>Your intuition about which arguments use tidy evaluation should be good for many common functions — just think about whether you can compute (e.g. <code>x + 1</code>) or select (e.g. <code>a:x</code>).</p>
<p>In the following sections well explore the sorts of handy functions you might write once you understand embracing.</p>
</section>
@@ -404,8 +404,8 @@ diamonds |&gt; summary6(carat)
#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt;
#&gt; 1 0.2 0.798 0.7 5.01 53940 0</pre>
</div>
<p>(Whenever you wrap <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code> in a helper, we think its good practice to set <code>.groups = "drop"</code> to both avoid the message and leave the data in an ungrouped state.)</p>
<p>The nice thing about this function is because it wraps <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code> you can used it on grouped data:</p>
<p>(Whenever you wrap <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> in a helper, we think its good practice to set <code>.groups = "drop"</code> to both avoid the message and leave the data in an ungrouped state.)</p>
<p>The nice thing about this function is because it wraps <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> you can used it on grouped data:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">diamonds |&gt;
group_by(cut) |&gt;
@@ -433,8 +433,8 @@ diamonds |&gt; summary6(carat)
#&gt; 4 Premium -0.699 -0.125 -0.0655 0.603 13791 0
#&gt; 5 Ideal -0.699 -0.225 -0.268 0.544 21551 0</pre>
</div>
<p>To summarize multiple variables youll need to wait until <a href="#sec-across" data-type="xref">#sec-across</a>, where youll learn how to use <code><a href="#chp-https://dplyr.tidyverse.org/reference/across" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/across</a></code>.</p>
<p>Another popular <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code> helper function is a version of <code><a href="#chp-https://dplyr.tidyverse.org/reference/count" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/count</a></code> that also computes proportions:</p>
<p>To summarize multiple variables youll need to wait until <a href="#sec-across" data-type="xref">#sec-across</a>, where youll learn how to use <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>.</p>
<p>Another popular <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> helper function is a version of <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code> that also computes proportions:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit"># https://twitter.com/Diabb6/status/1571635146658402309
count_prop &lt;- function(df, var, sort = FALSE) {
@@ -454,7 +454,7 @@ diamonds |&gt; count_prop(clarity)
#&gt; 6 VVS2 5066 0.0939
#&gt; # … with 2 more rows</pre>
</div>
<p>This function has three arguments: <code>df</code>, <code>var</code>, and <code>sort</code>, and only <code>var</code> needs to be embraced because its passed to <code><a href="#chp-https://dplyr.tidyverse.org/reference/count" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/count</a></code> which uses data-masking for all variables in <code></code>.</p>
<p>This function has three arguments: <code>df</code>, <code>var</code>, and <code>sort</code>, and only <code>var</code> needs to be embraced because its passed to <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code> which uses data-masking for all variables in <code></code>.</p>
<p>Or maybe you want to find the sorted unique values of a variable for a subset of the data. Rather than supplying a variable and a value to do the filtering, well allow the user to supply a condition:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">unique_where &lt;- function(df, condition, var) {
@@ -479,7 +479,7 @@ flights |&gt; unique_where(month == 12, dest)
flights |&gt; unique_where(tailnum == "N14228", month)
#&gt; [1] 1 2 3 4 5 6 7 8 9 10 12</pre>
</div>
<p>Here we embrace <code>condition</code> because its passed to <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> and <code>var</code> because its passed to <code><a href="#chp-https://dplyr.tidyverse.org/reference/distinct" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/distinct</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/arrange" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/arrange</a></code>, and <code><a href="#chp-https://dplyr.tidyverse.org/reference/pull" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/pull</a></code>.</p>
<p>Here we embrace <code>condition</code> because its passed to <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> and <code>var</code> because its passed to <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code>.</p>
<p>Weve made all these examples take a data frame as the first argument, but if youre working repeatedly with the same data, it can make sense to hardcode it. For example, the following function always works with the flights dataset and always selects <code>time_hour</code>, <code>carrier</code>, and <code>flight</code> since they form the compound primary key that allows you to identify a row.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">flights_sub &lt;- function(rows, cols) {
@@ -520,7 +520,7 @@ flights |&gt;
#&gt; Caused by error:
#&gt; ! `..1` must be size 336776 or 1, not 1010328.</pre>
</div>
<p>This doesnt work because <code><a href="#chp-https://dplyr.tidyverse.org/reference/group_by" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/group_by</a></code> uses data-masking, not tidy-selection. We can work around that problem by using the handy <code><a href="#chp-https://dplyr.tidyverse.org/reference/pick" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/pick</a></code> which allows you to use use tidy-selection inside data-masking functions:</p>
<p>This doesnt work because <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code> uses data-masking, not tidy-selection. We can work around that problem by using the handy <code><a href="https://dplyr.tidyverse.org/reference/pick.html">pick()</a></code> which allows you to use use tidy-selection inside data-masking functions:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">count_missing &lt;- function(df, group_vars, x_var) {
df |&gt;
@@ -543,7 +543,7 @@ flights |&gt;
#&gt; 6 2013 1 6 1
#&gt; # … with 359 more rows</pre>
</div>
<p>Another convenient use of <code><a href="#chp-https://dplyr.tidyverse.org/reference/pick" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/pick</a></code> is to make a 2d table of counts. Here we count using all the variables in the <code>rows</code> and <code>columns</code>, then use <code><a href="#chp-https://tidyr.tidyverse.org/reference/pivot_wider" data-type="xref">#chp-https://tidyr.tidyverse.org/reference/pivot_wider</a></code> to rearrange the counts into a grid:</p>
<p>Another convenient use of <code><a href="https://dplyr.tidyverse.org/reference/pick.html">pick()</a></code> is to make a 2d table of counts. Here we count using all the variables in the <code>rows</code> and <code>columns</code>, then use <code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code> to rearrange the counts into a grid:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit"># https://twitter.com/pollicipes/status/1571606508944719876
count_wide &lt;- function(data, rows, cols) {
@@ -579,7 +579,7 @@ diamonds |&gt; count_wide(c(clarity, color), cut)
#&gt; 6 I1 I 34 9 8 24 17
#&gt; # … with 50 more rows</pre>
</div>
<p>While our examples have mostly focused on dplyr, tidy evaluation also underpins tidyr, and if you look at the <code><a href="#chp-https://tidyr.tidyverse.org/reference/pivot_wider" data-type="xref">#chp-https://tidyr.tidyverse.org/reference/pivot_wider</a></code> docs you can see that <code>names_from</code> uses tidy-selection.</p>
<p>While our examples have mostly focused on dplyr, tidy evaluation also underpins tidyr, and if you look at the <code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code> docs you can see that <code>names_from</code> uses tidy-selection.</p>
</section>
<section id="exercises-1" data-type="sect2">
@@ -618,7 +618,7 @@ Exercises</h2>
</div>
</li>
</ol></li>
<li><p>For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: <code><a href="#chp-https://dplyr.tidyverse.org/reference/distinct" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/distinct</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/count" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/count</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/group_by" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/group_by</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/rename" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/rename</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/slice" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/slice</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/slice" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/slice</a></code>.</p></li>
<li><p>For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename_with()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/slice.html">slice_min()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/slice.html">slice_sample()</a></code>.</p></li>
<li>
<p>Generalize the following function so that you can supply any number of variables to count.</p>
<div class="cell">
@@ -635,7 +635,7 @@ Exercises</h2>
<section id="plot-functions" data-type="sect1">
<h1>
Plot functions</h1>
<p>Instead of returning a data frame, you might want to return a plot. Fortunately you can use the same techniques with ggplot2, because <code><a href="#chp-https://ggplot2.tidyverse.org/reference/aes" data-type="xref">#chp-https://ggplot2.tidyverse.org/reference/aes</a></code> is a data-masking function. For example, imagine that youre making a lot of histograms:</p>
<p>Instead of returning a data frame, you might want to return a plot. Fortunately you can use the same techniques with ggplot2, because <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> is a data-masking function. For example, imagine that youre making a lot of histograms:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">diamonds |&gt;
ggplot(aes(carat)) +
@@ -645,7 +645,7 @@ diamonds |&gt;
ggplot(aes(carat)) +
geom_histogram(binwidth = 0.05)</pre>
</div>
<p>Wouldnt it be nice if you could wrap this up into a histogram function? This is easy as once you know that <code><a href="#chp-https://ggplot2.tidyverse.org/reference/aes" data-type="xref">#chp-https://ggplot2.tidyverse.org/reference/aes</a></code> is a data-masking function so that you need to embrace:</p>
<p>Wouldnt it be nice if you could wrap this up into a histogram function? This is easy as once you know that <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> is a data-masking function so that you need to embrace:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">histogram &lt;- function(df, var, binwidth = NULL) {
df |&gt;
@@ -714,7 +714,7 @@ diamonds |&gt; hex_plot(carat, price, depth)</pre>
<section id="combining-with-dplyr" data-type="sect2">
<h2>
Combining with dplyr</h2>
<p>Some of the most useful helpers combine a dash of dplyr with ggplot2. For example, if you might want to do a vertical bar chart where you automatically sort the bars in frequency order using <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_inorder" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_inorder</a></code>. Since the bar chart is vertical, we also need to reverse the usual order to get the highest values at the top:</p>
<p>Some of the most useful helpers combine a dash of dplyr with ggplot2. For example, if you might want to do a vertical bar chart where you automatically sort the bars in frequency order using <code><a href="https://forcats.tidyverse.org/reference/fct_inorder.html">fct_infreq()</a></code>. Since the bar chart is vertical, we also need to reverse the usual order to get the highest values at the top:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">sorted_bars &lt;- function(df, var) {
df |&gt;
@@ -780,7 +780,7 @@ fancy_ts(df, value, dist_name)</pre>
<section id="faceting" data-type="sect2">
<h2>
Faceting</h2>
<p>Unfortunately programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work. so you have to learn a new syntax. When programming with facets, instead of writing <code>~ x</code>, you need to write <code>vars(x)</code> and instead of <code>~ x + y</code> you need to write <code>vars(x, y)</code>. The only advantage of this syntax is that <code><a href="#chp-https://ggplot2.tidyverse.org/reference/vars" data-type="xref">#chp-https://ggplot2.tidyverse.org/reference/vars</a></code> uses tidy evaluation so you can embrace within it:</p>
<p>Unfortunately programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work. so you have to learn a new syntax. When programming with facets, instead of writing <code>~ x</code>, you need to write <code>vars(x)</code> and instead of <code>~ x + y</code> you need to write <code>vars(x, y)</code>. The only advantage of this syntax is that <code><a href="https://ggplot2.tidyverse.org/reference/vars.html">vars()</a></code> uses tidy evaluation so you can embrace within it:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit"># https://twitter.com/sharoz/status/1574376332821204999
@@ -831,7 +831,7 @@ Labeling</h2>
}</pre>
</div>
<p>Wouldnt it be nice if we could label the output with the variable and the bin width that was used? To do so, were going to have to go under the covers of tidy evaluation and use a function from package we havent talked about before: rlang. rlang is a low-level package thats used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools).</p>
<p>To solve the labeling problem we can use <code><a href="#chp-https://rlang.r-lib.org/reference/englue" data-type="xref">#chp-https://rlang.r-lib.org/reference/englue</a></code>. This works similarly to <code><a href="#chp-https://stringr.tidyverse.org/reference/str_glue" data-type="xref">#chp-https://stringr.tidyverse.org/reference/str_glue</a></code>, so any value wrapped in <code><a href="#chp-https://rdrr.io/r/base/Paren" data-type="xref">#chp-https://rdrr.io/r/base/Paren</a></code> will be inserted into the string. But it also understands <code>{{ }}</code>, which automatically insert the appropriate variable name:</p>
<p>To solve the labeling problem we can use <code><a href="https://rlang.r-lib.org/reference/englue.html">rlang::englue()</a></code>. This works similarly to <code><a href="https://stringr.tidyverse.org/reference/str_glue.html">str_glue()</a></code>, so any value wrapped in <code><a href="https://rdrr.io/r/base/Paren.html">{ }</a></code> will be inserted into the string. But it also understands <code>{{ }}</code>, which automatically insert the appropriate variable name:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">histogram &lt;- function(df, var, binwidth) {
label &lt;- rlang::englue("A histogram of {{var}} with binwidth {binwidth}")
@@ -865,7 +865,7 @@ Exercises</h2>
<h1>
Style</h1>
<p>R doesnt care what your function or arguments are called but the names make a big difference for humans. Ideally, the name of your function will be short, but clearly evoke what the function does. Thats hard! But its better to be clear than short, as RStudios autocomplete makes it easy to type long names.</p>
<p>Generally, function names should be verbs, and arguments should be nouns. There are some exceptions: nouns are ok if the function computes a very well known noun (i.e. <code><a href="#chp-https://rdrr.io/r/base/mean" data-type="xref">#chp-https://rdrr.io/r/base/mean</a></code> is better than <code>compute_mean()</code>), or accessing some property of an object (i.e. <code><a href="#chp-https://rdrr.io/r/stats/coef" data-type="xref">#chp-https://rdrr.io/r/stats/coef</a></code> is better than <code>get_coefficients()</code>). Use your best judgement and dont be afraid to rename a function if you figure out a better name later.</p>
<p>Generally, function names should be verbs, and arguments should be nouns. There are some exceptions: nouns are ok if the function computes a very well known noun (i.e. <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code> is better than <code>compute_mean()</code>), or accessing some property of an object (i.e. <code><a href="https://rdrr.io/r/stats/coef.html">coef()</a></code> is better than <code>get_coefficients()</code>). Use your best judgement and dont be afraid to rename a function if you figure out a better name later.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit"># Too short
f()
@@ -877,7 +877,7 @@ my_awesome_function()
impute_missing()
collapse_years()</pre>
</div>
<p>R also doesnt care about how you use white space in your functions but future readers will. Continue to follow the rules from <a href="#chp-workflow-style" data-type="xref">#chp-workflow-style</a>. Additionally, <code>function()</code> should always be followed by squiggly brackets (<code><a href="#chp-https://rdrr.io/r/base/Paren" data-type="xref">#chp-https://rdrr.io/r/base/Paren</a></code>), and the contents should be indented by an additional two spaces. This makes it easier to see the hierarchy in your code by skimming the left-hand margin.</p>
<p>R also doesnt care about how you use white space in your functions but future readers will. Continue to follow the rules from <a href="#chp-workflow-style" data-type="xref">#chp-workflow-style</a>. Additionally, <code>function()</code> should always be followed by squiggly brackets (<code><a href="https://rdrr.io/r/base/Paren.html">{}</a></code>), and the contents should be indented by an additional two spaces. This makes it easier to see the hierarchy in your code by skimming the left-hand margin.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit"># missing extra two spaces
pull_unique &lt;- function(df, var) {
@@ -913,7 +913,7 @@ f3 &lt;- function(x, y) {
</div>
</li>
<li><p>Take a function that youve written recently and spend 5 minutes brainstorming a better name for it and its arguments.</p></li>
<li><p>Make a case for why <code>norm_r()</code>, <code>norm_d()</code> etc would be better than <code><a href="#chp-https://rdrr.io/r/stats/Normal" data-type="xref">#chp-https://rdrr.io/r/stats/Normal</a></code>, <code><a href="#chp-https://rdrr.io/r/stats/Normal" data-type="xref">#chp-https://rdrr.io/r/stats/Normal</a></code>. Make a case for the opposite.</p></li>
<li><p>Make a case for why <code>norm_r()</code>, <code>norm_d()</code> etc would be better than <code><a href="https://rdrr.io/r/stats/Normal.html">rnorm()</a></code>, <code><a href="https://rdrr.io/r/stats/Normal.html">dnorm()</a></code>. Make a case for the opposite.</p></li>
</ol></section>
</section>
@@ -922,9 +922,9 @@ f3 &lt;- function(x, y) {
Summary</h1>
<p>In this chapter you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot. Along the way your saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.</p>
<p>We have only shown you the bare minimum to get started with functions and theres much more to learn. A few places to learn more are:</p>
<ul><li>To learn more about programming with tidy evaluation, see useful recipes in <a href="#chp-https://dplyr.tidyverse.org/articles/programming" data-type="xref">#chp-https://dplyr.tidyverse.org/articles/programming</a> and <a href="#chp-https://tidyr.tidyverse.org/articles/programming" data-type="xref">#chp-https://tidyr.tidyverse.org/articles/programming</a> and learn more about the theory in <a href="#chp-https://rlang.r-lib.org/reference/topic-data-mask" data-type="xref">#chp-https://rlang.r-lib.org/reference/topic-data-mask</a>.</li>
<li>To learn more about reducing duplication in your ggplot2 code, read the <a href="#chp-https://ggplot2-book.org/programming" class="uri" data-type="xref">#chp-https://ggplot2-book.org/programming</a> chapter of the ggplot2 book.</li>
<li>For more advice on function style, see the <a href="#chp-https://style.tidyverse.org/functions" class="uri" data-type="xref">#chp-https://style.tidyverse.org/functions</a>.</li>
<ul><li>To learn more about programming with tidy evaluation, see useful recipes in <a href="https://dplyr.tidyverse.org/articles/programming.html">programming with dplyr</a> and <a href="https://tidyr.tidyverse.org/articles/programming.html">programming with tidyr</a> and learn more about the theory in <a href="https://rlang.r-lib.org/reference/topic-data-mask.html">What is data-masking and why do I need {{?</a>.</li>
<li>To learn more about reducing duplication in your ggplot2 code, read the <a href="https://ggplot2-book.org/programming.html" class="uri">Programming with ggplot2</a> chapter of the ggplot2 book.</li>
<li>For more advice on function style, see the <a href="https://style.tidyverse.org/functions.html" class="uri">tidyverse style guide</a>.</li>
</ul><p>In the next chapter, well dive into some of the details of Rs vector data structures that weve omitted so far. These are not immediately useful by themselves, but are a necessary foundation for the following chapter on iteration which gives you further tools for reducing code duplication.</p>