Don't transform non-crossref links

This commit is contained in:
Hadley Wickham
2022-11-18 10:30:32 -06:00
parent 4caea5281b
commit 78a1c12fe7
32 changed files with 693 additions and 693 deletions

View File

@@ -7,7 +7,7 @@
</div>
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
<p>To finish off the programming section, were going to give you a quick tour of the most important base R functions that we dont otherwise discuss in the book. These tools are particularly useful as you do more programming and will help you read code that youll encounter in the wild.</p><p>This is a good place to remind you that the tidyverse is not the only way to solve data science problems. We teach the tidyverse in this book because tidyverse packages share a common design philosophy, which increases the consistency across functions, making each new function or package a little easier to learn and use. Its not possible to use the tidyverse without using base R, so weve actually already taught you a <strong>lot</strong> of base R functions: from <code><a href="#chp-https://rdrr.io/r/base/library" data-type="xref">#chp-https://rdrr.io/r/base/library</a></code> to load packages, to <code><a href="#chp-https://rdrr.io/r/base/sum" data-type="xref">#chp-https://rdrr.io/r/base/sum</a></code> and <code><a href="#chp-https://rdrr.io/r/base/mean" data-type="xref">#chp-https://rdrr.io/r/base/mean</a></code> for numeric summaries, to the factor, date, and POSIXct data types, and of course all the basic operators like <code>+</code>, <code>-</code>, <code>/</code>, <code>*</code>, <code>|</code>, <code>&amp;</code>, and <code>!</code>. What we havent focused on so far is base R workflows, so we will highlight a few of those in this chapter.</p><p>After you read this book youll learn other approaches to the same problems using base R, data.table, and other packages. Youll certainly encounter these other approaches when you start reading R code written by other people, particularly if youre using StackOverflow. Its 100% okay to write code that uses a mix of approaches, and dont let anyone tell you otherwise!</p><p>In this chapter, well focus on four big topics: subsetting with <code>[</code>, subsetting with <code>[[</code> and <code>$</code>, the apply family of functions, and for loops. To finish off, well briefly discuss two important plotting functions.</p>
<p>To finish off the programming section, were going to give you a quick tour of the most important base R functions that we dont otherwise discuss in the book. These tools are particularly useful as you do more programming and will help you read code that youll encounter in the wild.</p><p>This is a good place to remind you that the tidyverse is not the only way to solve data science problems. We teach the tidyverse in this book because tidyverse packages share a common design philosophy, which increases the consistency across functions, making each new function or package a little easier to learn and use. Its not possible to use the tidyverse without using base R, so weve actually already taught you a <strong>lot</strong> of base R functions: from <code><a href="https://rdrr.io/r/base/library.html">library()</a></code> to load packages, to <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code> and <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code> for numeric summaries, to the factor, date, and POSIXct data types, and of course all the basic operators like <code>+</code>, <code>-</code>, <code>/</code>, <code>*</code>, <code>|</code>, <code>&amp;</code>, and <code>!</code>. What we havent focused on so far is base R workflows, so we will highlight a few of those in this chapter.</p><p>After you read this book youll learn other approaches to the same problems using base R, data.table, and other packages. Youll certainly encounter these other approaches when you start reading R code written by other people, particularly if youre using StackOverflow. Its 100% okay to write code that uses a mix of approaches, and dont let anyone tell you otherwise!</p><p>In this chapter, well focus on four big topics: subsetting with <code>[</code>, subsetting with <code>[[</code> and <code>$</code>, the apply family of functions, and for loops. To finish off, well briefly discuss two important plotting functions.</p>
<section id="prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
@@ -63,7 +63,7 @@ x %% 2 == 0
x[x %% 2 == 0]
#&gt; [1] 10 NA 8 NA</pre>
</div>
<p>Note that, unlike <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code>, <code>NA</code> indices will be included in the output as <code>NA</code>s.</p>
<p>Note that, unlike <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>, <code>NA</code> indices will be included in the output as <code>NA</code>s.</p>
</li>
<li>
<p><strong>A character vector</strong>. If you have a named vector, you can subset it with a character vector:</p>
@@ -145,7 +145,7 @@ df2[, "x"]
dplyr equivalents</h2>
<p>A number of dplyr verbs are special cases of <code>[</code>:</p>
<ul><li>
<p><code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> is equivalent to subsetting the rows with a logical vector, taking care to exclude missing values:</p>
<p><code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> is equivalent to subsetting the rows with a logical vector, taking care to exclude missing values:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df &lt;- tibble(
x = c(2, 3, 1, 1, NA),
@@ -157,10 +157,10 @@ df |&gt; filter(x &gt; 1)
# same as
df[!is.na(df$x) &amp; df$x &gt; 1, ]</pre>
</div>
<p>Another common technique in the wild is to use <code><a href="#chp-https://rdrr.io/r/base/which" data-type="xref">#chp-https://rdrr.io/r/base/which</a></code> for its side-effect of dropping missing values: <code>df[which(df$x &gt; 1), ]</code>.</p>
<p>Another common technique in the wild is to use <code><a href="https://rdrr.io/r/base/which.html">which()</a></code> for its side-effect of dropping missing values: <code>df[which(df$x &gt; 1), ]</code>.</p>
</li>
<li>
<p><code><a href="#chp-https://dplyr.tidyverse.org/reference/arrange" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/arrange</a></code> is equivalent to subsetting the rows with an integer vector, usually created with <code><a href="#chp-https://rdrr.io/r/base/order" data-type="xref">#chp-https://rdrr.io/r/base/order</a></code>:</p>
<p><code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code> is equivalent to subsetting the rows with an integer vector, usually created with <code><a href="https://rdrr.io/r/base/order.html">order()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df |&gt; arrange(x, y)
@@ -170,7 +170,7 @@ df[order(df$x, df$y), ]</pre>
<p>You can use <code>order(decreasing = TRUE)</code> to sort all columns in descending order or <code>-rank(col)</code> to individual sort columns in decreasing order.</p>
</li>
<li>
<p>Both <code><a href="#chp-https://dplyr.tidyverse.org/reference/select" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/select</a></code> and <code><a href="#chp-https://dplyr.tidyverse.org/reference/relocate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/relocate</a></code> are similar to subsetting the columns with a character vector:</p>
<p>Both <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/relocate.html">relocate()</a></code> are similar to subsetting the columns with a character vector:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df |&gt; select(x, z)
@@ -178,7 +178,7 @@ df[order(df$x, df$y), ]</pre>
df[, c("x", "z")]</pre>
</div>
</li>
</ul><p>Base R also provides a function that combines the features of <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> and <code><a href="#chp-https://dplyr.tidyverse.org/reference/select" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/select</a></code><span data-type="footnote">But it doesnt handle grouped data frames differently and it doesnt support selection helper functions like <code><a href="#chp-https://tidyselect.r-lib.org/reference/starts_with" data-type="xref">#chp-https://tidyselect.r-lib.org/reference/starts_with</a></code>.</span> called <code><a href="#chp-https://rdrr.io/r/base/subset" data-type="xref">#chp-https://rdrr.io/r/base/subset</a></code>:</p>
</ul><p>Base R also provides a function that combines the features of <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code><span data-type="footnote">But it doesnt handle grouped data frames differently and it doesnt support selection helper functions like <code><a href="https://tidyselect.r-lib.org/reference/starts_with.html">starts_with()</a></code>.</span> called <code><a href="https://rdrr.io/r/base/subset.html">subset()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df |&gt;
filter(x &gt; 1) |&gt;
@@ -209,7 +209,7 @@ Exercises</h2>
<li>Every element except the last value.</li>
<li>Only even values (and no missing values).</li>
</ol></li>
<li><p>Why is <code>x[-which(x &gt; 0)]</code> not the same as <code>x[x &lt;= 0]</code>? Read the documentation for <code><a href="#chp-https://rdrr.io/r/base/which" data-type="xref">#chp-https://rdrr.io/r/base/which</a></code> and do some experiments to figure it out.</p></li>
<li><p>Why is <code>x[-which(x &gt; 0)]</code> not the same as <code>x[x &lt;= 0]</code>? Read the documentation for <code><a href="https://rdrr.io/r/base/which.html">which()</a></code> and do some experiments to figure it out.</p></li>
</ol></section>
</section>
@@ -222,7 +222,7 @@ Selecting a single element<code>$</code> and <code>[[</code>
<section id="data-frames" data-type="sect2">
<h2>
Data frames</h2>
<p><code>[[</code> and <code>$</code> can be used like <code><a href="#chp-https://dplyr.tidyverse.org/reference/pull" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/pull</a></code> to extract columns out of a data frame. <code>[[</code> can access by position or by name, and <code>$</code> is specialized for access by name:</p>
<p><code>[[</code> and <code>$</code> can be used like <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> to extract columns out of a data frame. <code>[[</code> can access by position or by name, and <code>$</code> is specialized for access by name:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">tb &lt;- tibble(
x = 1:4,
@@ -239,7 +239,7 @@ tb[["x"]]
tb$x
#&gt; [1] 1 2 3 4</pre>
</div>
<p>They can also be used to create new columns, the base R equivalent of <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code>:</p>
<p>They can also be used to create new columns, the base R equivalent of <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">tb$z &lt;- tb$x + tb$y
tb
@@ -251,8 +251,8 @@ tb
#&gt; 3 3 1 4
#&gt; 4 4 21 25</pre>
</div>
<p>There are a number other base approaches to creating new columns including with <code><a href="#chp-https://rdrr.io/r/base/transform" data-type="xref">#chp-https://rdrr.io/r/base/transform</a></code>, <code><a href="#chp-https://rdrr.io/r/base/with" data-type="xref">#chp-https://rdrr.io/r/base/with</a></code>, and <code><a href="#chp-https://rdrr.io/r/base/with" data-type="xref">#chp-https://rdrr.io/r/base/with</a></code>. Hadley collected a few examples at <a href="https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf" class="uri">https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf</a>.</p>
<p>Using <code>$</code> directly is convenient when performing quick summaries. For example, if you just want find the size of the biggest diamond or the possible values of <code>cut</code>, theres no need to use <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code>:</p>
<p>There are a number other base approaches to creating new columns including with <code><a href="https://rdrr.io/r/base/transform.html">transform()</a></code>, <code><a href="https://rdrr.io/r/base/with.html">with()</a></code>, and <code><a href="https://rdrr.io/r/base/with.html">within()</a></code>. Hadley collected a few examples at <a href="https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf" class="uri">https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf</a>.</p>
<p>Using <code>$</code> directly is convenient when performing quick summaries. For example, if you just want find the size of the biggest diamond or the possible values of <code>cut</code>, theres no need to use <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">max(diamonds$carat)
#&gt; [1] 5.01
@@ -384,9 +384,9 @@ Exercises</h2>
<section id="apply-family" data-type="sect1">
<h1>
Apply family</h1>
<p>In <a href="#chp-iteration" data-type="xref">#chp-iteration</a>, you learned tidyverse techniques for iteration like <code><a href="#chp-https://dplyr.tidyverse.org/reference/across" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/across</a></code> and the map family of functions. In this section, youll learn about their base equivalents, the <strong>apply family</strong>. In this context apply and maps are synonyms because another way of saying “map a function over each element of a vector” is “apply a function over each element of a vector”. Here well give you a quick overview of this family so you can recognize them in the wild.</p>
<p>The most important member of this family is <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code>, which is very similar to <code><a href="#chp-https://purrr.tidyverse.org/reference/map" data-type="xref">#chp-https://purrr.tidyverse.org/reference/map</a></code><span data-type="footnote">It just lacks convenient features like progress bars and reporting which element caused the problem if theres an error.</span>. In fact, because we havent used any of <code><a href="#chp-https://purrr.tidyverse.org/reference/map" data-type="xref">#chp-https://purrr.tidyverse.org/reference/map</a></code>s more advanced features, you can replace every <code><a href="#chp-https://purrr.tidyverse.org/reference/map" data-type="xref">#chp-https://purrr.tidyverse.org/reference/map</a></code> call in <a href="#chp-iteration" data-type="xref">#chp-iteration</a> with <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code>.</p>
<p>Theres no exact base R equivalent to <code><a href="#chp-https://dplyr.tidyverse.org/reference/across" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/across</a></code> but you can get close by using <code>[</code> with <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code>. This works because under the hood, data frames are lists of columns, so calling <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> on a data frame applies the function to each column.</p>
<p>In <a href="#chp-iteration" data-type="xref">#chp-iteration</a>, you learned tidyverse techniques for iteration like <code><a href="https://dplyr.tidyverse.org/reference/across.html">dplyr::across()</a></code> and the map family of functions. In this section, youll learn about their base equivalents, the <strong>apply family</strong>. In this context apply and maps are synonyms because another way of saying “map a function over each element of a vector” is “apply a function over each element of a vector”. Here well give you a quick overview of this family so you can recognize them in the wild.</p>
<p>The most important member of this family is <code><a href="https://rdrr.io/r/base/lapply.html">lapply()</a></code>, which is very similar to <code><a href="https://purrr.tidyverse.org/reference/map.html">purrr::map()</a></code><span data-type="footnote">It just lacks convenient features like progress bars and reporting which element caused the problem if theres an error.</span>. In fact, because we havent used any of <code><a href="https://purrr.tidyverse.org/reference/map.html">map()</a></code>s more advanced features, you can replace every <code><a href="https://purrr.tidyverse.org/reference/map.html">map()</a></code> call in <a href="#chp-iteration" data-type="xref">#chp-iteration</a> with <code><a href="https://rdrr.io/r/base/lapply.html">lapply()</a></code>.</p>
<p>Theres no exact base R equivalent to <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code> but you can get close by using <code>[</code> with <code><a href="https://rdrr.io/r/base/lapply.html">lapply()</a></code>. This works because under the hood, data frames are lists of columns, so calling <code><a href="https://rdrr.io/r/base/lapply.html">lapply()</a></code> on a data frame applies the function to each column.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df &lt;- tibble(a = 1, b = 2, c = "a", d = "b", e = 4)
@@ -404,15 +404,15 @@ df
#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 2 4 a b 8</pre>
</div>
<p>The code above uses a new function, <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code>. Its similar to <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> but it always tries to simplify the result, hence the <code>s</code> in its name, here producing a logical vector instead of a list. We dont recommend using it for programming, because the simplification can fail and give you an unexpected type, but its usually fine for interactive use. purrr has a similar function called <code><a href="#chp-https://purrr.tidyverse.org/reference/map" data-type="xref">#chp-https://purrr.tidyverse.org/reference/map</a></code> that we didnt mention in <a href="#chp-iteration" data-type="xref">#chp-iteration</a>.</p>
<p>Base R provides a stricter version of <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> called <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code>, short for <strong>v</strong>ector apply. It takes an additional argument that specifies the expected type, ensuring that simplification occurs the same way regardless of the input. For example, we could replace the <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> call above with this <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> where we specify that we expect <code><a href="#chp-https://rdrr.io/r/base/numeric" data-type="xref">#chp-https://rdrr.io/r/base/numeric</a></code> to return a logical vector of length 1:</p>
<p>The code above uses a new function, <code><a href="https://rdrr.io/r/base/lapply.html">sapply()</a></code>. Its similar to <code><a href="https://rdrr.io/r/base/lapply.html">lapply()</a></code> but it always tries to simplify the result, hence the <code>s</code> in its name, here producing a logical vector instead of a list. We dont recommend using it for programming, because the simplification can fail and give you an unexpected type, but its usually fine for interactive use. purrr has a similar function called <code><a href="https://purrr.tidyverse.org/reference/map.html">map_vec()</a></code> that we didnt mention in <a href="#chp-iteration" data-type="xref">#chp-iteration</a>.</p>
<p>Base R provides a stricter version of <code><a href="https://rdrr.io/r/base/lapply.html">sapply()</a></code> called <code><a href="https://rdrr.io/r/base/lapply.html">vapply()</a></code>, short for <strong>v</strong>ector apply. It takes an additional argument that specifies the expected type, ensuring that simplification occurs the same way regardless of the input. For example, we could replace the <code><a href="https://rdrr.io/r/base/lapply.html">sapply()</a></code> call above with this <code><a href="https://rdrr.io/r/base/lapply.html">vapply()</a></code> where we specify that we expect <code><a href="https://rdrr.io/r/base/numeric.html">is.numeric()</a></code> to return a logical vector of length 1:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">vapply(df, is.numeric, logical(1))
#&gt; a b c d e
#&gt; TRUE TRUE FALSE FALSE TRUE</pre>
</div>
<p>The distinction between <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> and <code><a href="#chp-https://rdrr.io/r/base/lapply" data-type="xref">#chp-https://rdrr.io/r/base/lapply</a></code> is really important when theyre inside a function (because it makes a big difference to the functions robustness to unusual inputs), but it doesnt usually matter in data analysis.</p>
<p>Another important member of the apply family is <code><a href="#chp-https://rdrr.io/r/base/tapply" data-type="xref">#chp-https://rdrr.io/r/base/tapply</a></code> which computes a single grouped summary:</p>
<p>The distinction between <code><a href="https://rdrr.io/r/base/lapply.html">sapply()</a></code> and <code><a href="https://rdrr.io/r/base/lapply.html">vapply()</a></code> is really important when theyre inside a function (because it makes a big difference to the functions robustness to unusual inputs), but it doesnt usually matter in data analysis.</p>
<p>Another important member of the apply family is <code><a href="https://rdrr.io/r/base/tapply.html">tapply()</a></code> which computes a single grouped summary:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">diamonds |&gt;
group_by(cut) |&gt;
@@ -430,8 +430,8 @@ tapply(diamonds$price, diamonds$cut, mean)
#&gt; Fair Good Very Good Premium Ideal
#&gt; 4358.758 3928.864 3981.760 4584.258 3457.542</pre>
</div>
<p>Unfortunately <code><a href="#chp-https://rdrr.io/r/base/tapply" data-type="xref">#chp-https://rdrr.io/r/base/tapply</a></code> returns its results in a named vector which requires some gymnastics if you want to collect multiple summaries and grouping variables into a data frame (its certainly possible to not do this and just work with free floating vectors, but in our experience that just delays the work). If you want to see how you might use <code><a href="#chp-https://rdrr.io/r/base/tapply" data-type="xref">#chp-https://rdrr.io/r/base/tapply</a></code> or other base techniques to perform other grouped summaries, Hadley has collected a few techniques <a href="#chp-https://gist.github.com/hadley/c430501804349d382ce90754936ab8ec" data-type="xref">#chp-https://gist.github.com/hadley/c430501804349d382ce90754936ab8ec</a>.</p>
<p>The final member of the apply family is the titular <code><a href="#chp-https://rdrr.io/r/base/apply" data-type="xref">#chp-https://rdrr.io/r/base/apply</a></code>, which works with matrices and arrays. In particular, watch out of <code>apply(df, 2, something)</code> which is a slow and potentially dangerous way of doing <code>lapply(df, something)</code>. This rarely comes up in data science because we usually work with data frames and not matrices.</p>
<p>Unfortunately <code><a href="https://rdrr.io/r/base/tapply.html">tapply()</a></code> returns its results in a named vector which requires some gymnastics if you want to collect multiple summaries and grouping variables into a data frame (its certainly possible to not do this and just work with free floating vectors, but in our experience that just delays the work). If you want to see how you might use <code><a href="https://rdrr.io/r/base/tapply.html">tapply()</a></code> or other base techniques to perform other grouped summaries, Hadley has collected a few techniques <a href="https://gist.github.com/hadley/c430501804349d382ce90754936ab8ec">in a gist</a>.</p>
<p>The final member of the apply family is the titular <code><a href="https://rdrr.io/r/base/apply.html">apply()</a></code>, which works with matrices and arrays. In particular, watch out of <code>apply(df, 2, something)</code> which is a slow and potentially dangerous way of doing <code>lapply(df, something)</code>. This rarely comes up in data science because we usually work with data frames and not matrices.</p>
</section>
<section id="for-loops" data-type="sect1">
@@ -443,7 +443,7 @@ For loops</h1>
# do something with element
}</pre>
</div>
<p>The most straightforward use of <code>for()</code> loops is achieve the same affect as <code><a href="#chp-https://purrr.tidyverse.org/reference/map" data-type="xref">#chp-https://purrr.tidyverse.org/reference/map</a></code>: call some function with a side-effect on each element of a list. For example, in <a href="#sec-save-database" data-type="xref">#sec-save-database</a> instead of using walk:</p>
<p>The most straightforward use of <code>for()</code> loops is achieve the same affect as <code><a href="https://purrr.tidyverse.org/reference/map.html">walk()</a></code>: call some function with a side-effect on each element of a list. For example, in <a href="#sec-save-database" data-type="xref">#sec-save-database</a> instead of using walk:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">paths |&gt; walk(append_file)</pre>
</div>
@@ -458,11 +458,11 @@ For loops</h1>
<pre data-type="programlisting" data-code-language="downlit">paths &lt;- dir("data/gapminder", pattern = "\\.xlsx$", full.names = TRUE)
files &lt;- map(paths, readxl::read_excel)</pre>
</div>
<p>There are a few different techniques that you can use, but we recommend being explicit about what the output is going to look like upfront. In this case, were going to want a list the same length as <code>paths</code>, which we can create with <code><a href="#chp-https://rdrr.io/r/base/vector" data-type="xref">#chp-https://rdrr.io/r/base/vector</a></code>:</p>
<p>There are a few different techniques that you can use, but we recommend being explicit about what the output is going to look like upfront. In this case, were going to want a list the same length as <code>paths</code>, which we can create with <code><a href="https://rdrr.io/r/base/vector.html">vector()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">files &lt;- vector("list", length(paths))</pre>
</div>
<p>Then instead of iterating over the elements of <code>paths</code>, well iterate over their indices, using <code><a href="#chp-https://rdrr.io/r/base/seq" data-type="xref">#chp-https://rdrr.io/r/base/seq</a></code> to generate one index for each element of paths:</p>
<p>Then instead of iterating over the elements of <code>paths</code>, well iterate over their indices, using <code><a href="https://rdrr.io/r/base/seq.html">seq_along()</a></code> to generate one index for each element of paths:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">seq_along(paths)
#&gt; [1] 1 2 3 4 5 6 7 8 9 10 11 12</pre>
@@ -473,7 +473,7 @@ files &lt;- map(paths, readxl::read_excel)</pre>
files[[i]] &lt;- readxl::read_excel(paths[[i]])
}</pre>
</div>
<p>To combine the list of tibbles into a single tibble you can use <code><a href="#chp-https://rdrr.io/r/base/do.call" data-type="xref">#chp-https://rdrr.io/r/base/do.call</a></code> + <code><a href="#chp-https://rdrr.io/r/base/cbind" data-type="xref">#chp-https://rdrr.io/r/base/cbind</a></code>:</p>
<p>To combine the list of tibbles into a single tibble you can use <code><a href="https://rdrr.io/r/base/do.call.html">do.call()</a></code> + <code><a href="https://rdrr.io/r/base/cbind.html">rbind()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">do.call(rbind, files)
#&gt; # A tibble: 1,704 × 5
@@ -501,7 +501,7 @@ for (path in paths) {
<h1>
Plots</h1>
<p>Many R users who dont otherwise use the tidyverse prefer ggplot2 for plotting due to helpful features like sensible defaults, automatic legends, modern look. However, base R plotting functions can still be useful because theyre so concise — its very little typing to do a basic exploratory plot.</p>
<p>There are two main types of base plot youll see in the wild: scatterplots and histograms, produced with <code><a href="#chp-https://rdrr.io/r/graphics/plot.default" data-type="xref">#chp-https://rdrr.io/r/graphics/plot.default</a></code> and <code><a href="#chp-https://rdrr.io/r/graphics/hist" data-type="xref">#chp-https://rdrr.io/r/graphics/hist</a></code> respectively. Heres a quick example from the diamonds dataset:</p>
<p>There are two main types of base plot youll see in the wild: scatterplots and histograms, produced with <code><a href="https://rdrr.io/r/graphics/plot.default.html">plot()</a></code> and <code><a href="https://rdrr.io/r/graphics/hist.html">hist()</a></code> respectively. Heres a quick example from the diamonds dataset:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">hist(diamonds$carat)