More minor page count tweaks & fixes

And re-convert with latest htmlbook
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions
--- a/oreilly/data-visualize.html
+++ b/oreilly/data-visualize.html
@@ -1,6 +1,6 @@
 <section data-type="chapter" id="chp-data-visualize">
 <h1><span id="sec-data-visualization" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data visualization</span></span></h1>
-<section id="introduction" data-type="sect1">
+<section id="data-visualize-introduction" data-type="sect1">
 <h1>
 Introduction</h1>
 <blockquote class="blockquote">
@@ -9,30 +9,30 @@ Introduction</h1>
 <p>R has several systems for making graphs, but ggplot2 is one of the most elegant and most versatile. ggplot2 implements the <strong>grammar of graphics</strong>, a coherent system for describing and building graphs. With ggplot2, you can do more and faster by learning one system and applying it in many places.</p>
 <p>This chapter will teach you how to visualize your data using <strong>ggplot2</strong>. We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects – the fundamental building blocks of ggplot2. We will then walk you through visualizing distributions of single variables as well as visualizing relationships between two or more variables. We’ll finish off with saving your plots and troubleshooting tips.</p>

-<section id="prerequisites" data-type="sect2">
+<section id="data-visualize-prerequisites" data-type="sect2">
 <h2>
 Prerequisites</h2>
-<p>This chapter focuses on ggplot2, one of the core packages in the tidyverse. To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running this code:</p>
+<p>This chapter focuses on ggplot2, one of the core packages in the tidyverse. To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">library(tidyverse)
 #&gt; ── Attaching core tidyverse packages ──────────────── tidyverse 1.3.2.9000 ──
 #&gt; ✔ dplyr     1.0.99.9000     ✔ readr     2.1.3      
-#&gt; ✔ forcats   0.5.2.9000      ✔ stringr   1.5.0.9000 
+#&gt; ✔ forcats   0.5.2           ✔ stringr   1.5.0      
 #&gt; ✔ ggplot2   3.4.0.9000      ✔ tibble    3.1.8      
-#&gt; ✔ lubridate 1.9.0           ✔ tidyr     1.2.1.9001 
+#&gt; ✔ lubridate 1.9.0           ✔ tidyr     1.3.0      
 #&gt; ✔ purrr     1.0.1           
 #&gt; ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
 #&gt; ✖ dplyr::filter() masks stats::filter()
 #&gt; ✖ dplyr::lag()    masks stats::lag()
-#&gt; ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors</pre>
+#&gt; ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</pre>
 </div>
-<p>That one line of code loads the core tidyverse; packages which you will use in almost every data analysis. It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).</p>
+<p>That one line of code loads the core tidyverse; the packages that you will use in almost every data analysis. It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded)<span data-type="footnote">You can eliminate that message and force conflict resolution to happen on demand by using the conflicted package, which becomes more important as you load more packages. You can learn more about conflicted at <a href="https://conflicted.r-lib.org" class="uri">https://conflicted.r-lib.org</a>.</span>.</p>
 <p>If you run this code and get the error message <code>there is no package called 'tidyverse'</code>, you’ll need to first install it, then run <code><a href="https://rdrr.io/r/base/library.html">library()</a></code> once again.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">install.packages("tidyverse")
 library(tidyverse)</pre>
 </div>
-<p>You only need to install a package once, but you need to reload it every time you start a new session.</p>
+<p>You only need to install a package once, but you need to load it every time you start a new session.</p>
 <p>In addition to tidyverse, we will also use the <strong>palmerpenguins</strong> package, which includes the <code>penguins</code> dataset containing body measurements for penguins on three islands in the Palmer Archipelago.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">library(palmerpenguins)</pre>
@@ -47,20 +47,21 @@ First steps</h1>

 <section id="the-penguins-data-frame" data-type="sect2">
 <h2>
-The<code>penguins</code> data frame</h2>
+The penguins data frame</h2>
 <p>You can test your answer with the <code>penguins</code> <strong>data frame</strong> found in palmerpenguins (a.k.a. <code><a href="https://allisonhorst.github.io/palmerpenguins/reference/penguins.html">palmerpenguins::penguins</a></code>). A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). <code>penguins</code> contains 344 observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER<span data-type="footnote">Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. <a href="https://allisonhorst.github.io/palmerpenguins/" class="uri">https://allisonhorst.github.io/palmerpenguins/</a>. doi: 10.5281/zenodo.3960218.</span>.</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">penguins
 #&gt; # A tibble: 344 × 8
-#&gt;   species island   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
-#&gt;   &lt;fct&gt;   &lt;fct&gt;             &lt;dbl&gt;         &lt;dbl&gt;             &lt;int&gt;       &lt;int&gt;
-#&gt; 1 Adelie  Torgers…           39.1          18.7               181        3750
-#&gt; 2 Adelie  Torgers…           39.5          17.4               186        3800
-#&gt; 3 Adelie  Torgers…           40.3          18                 195        3250
-#&gt; 4 Adelie  Torgers…           NA            NA                  NA          NA
-#&gt; 5 Adelie  Torgers…           36.7          19.3               193        3450
-#&gt; 6 Adelie  Torgers…           39.3          20.6               190        3650
-#&gt; # … with 338 more rows, and 2 more variables: sex &lt;fct&gt;, year &lt;int&gt;</pre>
+#&gt;   species island    bill_length_mm bill_depth_mm flipper_length_mm
+#&gt;   &lt;fct&gt;   &lt;fct&gt;              &lt;dbl&gt;         &lt;dbl&gt;             &lt;int&gt;
+#&gt; 1 Adelie  Torgersen           39.1          18.7               181
+#&gt; 2 Adelie  Torgersen           39.5          17.4               186
+#&gt; 3 Adelie  Torgersen           40.3          18                 195
+#&gt; 4 Adelie  Torgersen           NA            NA                  NA
+#&gt; 5 Adelie  Torgersen           36.7          19.3               193
+#&gt; 6 Adelie  Torgersen           39.3          20.6               190
+#&gt; # … with 338 more rows, and 3 more variables: body_mass_g &lt;int&gt;, sex &lt;fct&gt;,
+#&gt; #   year &lt;int&gt;</pre>
 </div>
 <p>This data frame contains 8 columns. For an alternative view, where you can see all variables and the first few observations of each variable, use <code><a href="https://pillar.r-lib.org/reference/glimpse.html">glimpse()</a></code>. Or, if you’re in RStudio, run <code>View(penguins)</code> to open an interactive data viewer.</p>
 <div class="cell">
@@ -239,7 +240,7 @@ Adding aesthetics and layers</h2>
 <p>We finally have a plot that perfectly matches our “ultimate goal”!</p>
 </section>

-<section id="exercises" data-type="sect2">
+<section id="data-visualize-exercises" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li><p>How many rows are in <code>penguins</code>? How many columns?</p></li>
@@ -410,7 +411,7 @@ ggplot(penguins, aes(x = body_mass_g)) +
 </div>
 </section>

-<section id="exercises-1" data-type="sect2">
+<section id="data-visualize-exercises-1" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li><p>Make a bar plot of <code>species</code> of <code>penguins</code>, where you assign <code>species</code> to the <code>y</code> aesthetic. How is this plot different?</p></li>
@@ -479,7 +480,7 @@ A numerical and a categorical variable</h2>
 <li>Otherwise, we <em>set</em> the value of an aesthetic.</li>
 </ul></section>

-<section id="two-categorical-variables" data-type="sect2">
+<section id="data-visualize-two-categorical-variables" data-type="sect2">
 <h2>
 Two categorical variables</h2>
 <p>We can use segmented bar plots to visualize the distribution between two categorical variables. In creating this bar chart, we map the variable we want to divide the data into first to the <code>x</code> aesthetic and the variable we then further want to divide each group into to the <code>fill</code> aesthetic.</p>
@@ -498,7 +499,7 @@ ggplot(penguins, aes(x = island, fill = species)) +
 </div>
 </section>

-<section id="two-numerical-variables" data-type="sect2">
+<section id="data-visualize-two-numerical-variables" data-type="sect2">
 <h2>
 Two numerical variables</h2>
 <p>So far you’ve learned about scatterplots (created with <code><a href="https://ggplot2.tidyverse.org/reference/geom_point.html">geom_point()</a></code>) and smooth curves (created with <code><a href="https://ggplot2.tidyverse.org/reference/geom_smooth.html">geom_smooth()</a></code>) for visualizing the relationship between two numerical variables. A scatterplot is probably the most commonly used plot for visualizing the relationship between two variables.</p>
@@ -535,7 +536,7 @@ Three or more variables</h2>
 <p>You will learn about many other geoms for visualizing distributions of variables and relationships between them in <a href="#chp-layers" data-type="xref">#chp-layers</a>.</p>
 </section>

-<section id="exercises-2" data-type="sect2">
+<section id="data-visualize-exercises-2" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li><p>Which variables in <code>mpg</code> are categorical? Which variables are continuous? (Hint: type <code><a href="https://ggplot2.tidyverse.org/reference/mpg.html">?mpg</a></code> to read the documentation for the dataset). How can you see this information when you run <code>mpg</code>?</p></li>
@@ -576,7 +577,7 @@ ggsave(filename = "my-plot.png")</pre>
 <p>If you don’t specify the <code>width</code> and <code>height</code> they will be taken from the dimensions of the current plotting device. For reproducible code, you’ll want to specify them. You can learn more about <code><a href="https://ggplot2.tidyverse.org/reference/ggsave.html">ggsave()</a></code> in the documentation.</p>
 <p>Generally, however, we recommend that you assemble your final reports using Quarto, a reproducible authoring system that allows you to interleave your code and your prose and automatically include your plots in your write-ups. You will learn more about Quarto in <a href="#chp-quarto" data-type="xref">#chp-quarto</a>.</p>

-<section id="exercises-3" data-type="sect2">
+<section id="data-visualize-exercises-3" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li>
@@ -607,7 +608,7 @@ Common problems</h1>
 <p>If that doesn’t help, carefully read the error message. Sometimes the answer will be buried there! But when you’re new to R, even if the answer is in the error message, you might not yet know how to understand it. Another great tool is Google: try googling the error message, as it’s likely someone else has had the same problem, and has gotten help online.</p>
 </section>

-<section id="summary" data-type="sect1">
+<section id="data-visualize-summary" data-type="sect1">
 <h1>
 Summary</h1>
 <p>In this chapter, you’ve learned the basics of data visualization with ggplot2. We started with the basic idea that underpins ggplot2: a visualization is a mapping from variables in your data to aesthetic properties like position, color, size and shape. You then learned about increasing the complexity and improving the presentation of your plots layer-by-layer. You also learned about commonly used plots for visualizing the distribution of a single variable as well as for visualizing relationships between two or more variables, by levering additional aesthetic mappings and/or splitting your plot into small multiples using faceting.</p>