More minor page count tweaks & fixes

And re-convert with latest htmlbook
This commit is contained in:
Hadley Wickham
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions

View File

@@ -1,6 +1,6 @@
<section data-type="chapter" id="chp-base-R">
<h1><span id="sec-base-r" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">A field guide to base R</span></span></h1><p>To finish off the programming section, were going to give you a quick tour of the most important base R functions that we dont otherwise discuss in the book. These tools are particularly useful as you do more programming and will help you read code that youll encounter in the wild.</p><p>This is a good place to remind you that the tidyverse is not the only way to solve data science problems. We teach the tidyverse in this book because tidyverse packages share a common design philosophy, which increases the consistency across functions, making each new function or package a little easier to learn and use. Its not possible to use the tidyverse without using base R, so weve actually already taught you a <strong>lot</strong> of base R functions: from <code><a href="https://rdrr.io/r/base/library.html">library()</a></code> to load packages, to <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code> and <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code> for numeric summaries, to the factor, date, and POSIXct data types, and of course all the basic operators like <code>+</code>, <code>-</code>, <code>/</code>, <code>*</code>, <code>|</code>, <code>&amp;</code>, and <code>!</code>. What we havent focused on so far is base R workflows, so we will highlight a few of those in this chapter.</p><p>After you read this book youll learn other approaches to the same problems using base R, data.table, and other packages. Youll certainly encounter these other approaches when you start reading R code written by other people, particularly if youre using StackOverflow. Its 100% okay to write code that uses a mix of approaches, and dont let anyone tell you otherwise!</p><p>In this chapter, well focus on four big topics: subsetting with <code>[</code>, subsetting with <code>[[</code> and <code>$</code>, the apply family of functions, and <code>for</code> loops. To finish off, well briefly discuss two important plotting functions.</p>
<section id="prerequisites" data-type="sect2">
<section id="base-R-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<div class="cell">
@@ -10,7 +10,7 @@ Prerequisites</h2>
<section id="sec-subset-many" data-type="sect1">
<h1>
Selecting multiple elements with<code>[</code>
Selecting multiple elements with [
</h1>
<p><code>[</code> is used to extract sub-components from vectors and data frames, and is called like <code>x[i]</code> or <code>x[i, j]</code>. In this section, well introduce you to the power of <code>[</code>, first showing you how you can use it with vectors, then how the same principles extend in a straightforward way to two-dimensional (2d) structures like data frames. Well then help you cement that knowledge by showing how various dplyr verbs are special cases of <code>[</code>.</p>
@@ -188,7 +188,7 @@ df |&gt; subset(x &gt; 1, c(y, z))
<p>This function was the inspiration for much of dplyrs syntax.</p>
</section>
<section id="exercises" data-type="sect2">
<section id="base-R-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li>
@@ -203,7 +203,7 @@ Exercises</h2>
<section id="sec-subset-one" data-type="sect1">
<h1>
Selecting a single element<code>$</code> and <code>[[</code>
Selecting a single element with $ and [[
</h1>
<p><code>[</code>, which selects many elements, is paired with <code>[[</code> and <code>$</code>, which extract a single element. In this section, well show you how to use <code>[[</code> and <code>$</code> to pull columns out of data frames, discuss a couple more differences between <code>data.frames</code> and tibbles, and emphasize some important differences between <code>[</code> and <code>[[</code> when used with lists.</p>
@@ -284,7 +284,7 @@ tb$z
<p>For this reason we sometimes joke that tibbles are lazy and surly: they do less and complain more.</p>
</section>
<section id="lists" data-type="sect2">
<section id="base-R-lists" data-type="sect2">
<h2>
Lists</h2>
<p><code>[[</code> and <code>$</code> are also really important for working with lists, and its important to understand how they differ from <code>[</code>. Lets illustrate the differences with a list named <code>l</code>:</p>
@@ -372,7 +372,7 @@ df[["x"]]
</div>
</section>
<section id="exercises-1" data-type="sect2">
<section id="base-R-exercises-1" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>What happens when you use <code>[[</code> with a positive integer thats bigger than the length of the vector? What happens when you subset with a name that doesnt exist?</p></li>
@@ -515,7 +515,7 @@ plot(diamonds$carat, diamonds$price)</pre>
<p>Note that base plotting functions work with vectors, so you need to pull columns out of the data frame using <code>$</code> or some other technique.</p>
</section>
<section id="summary" data-type="sect1">
<section id="base-R-summary" data-type="sect1">
<h1>
Summary</h1>
<p>In this chapter, weve shown you a selection of base R functions useful for subsetting and iteration. Compared to approaches discussed elsewhere in the book, these functions tend to have more of a “vector” flavor than a “data frame” flavor because base R functions tend to take individual vectors, rather than a data frame and some column specification. This often makes life easier for programming and so becomes more important as you write more functions and begin to write your own packages.</p>