Re-render book for O'Reilly

This commit is contained in:
Hadley Wickham
2023-01-12 17:22:57 -06:00
parent 28671ed8bd
commit 360d65ae47
113 changed files with 4957 additions and 2997 deletions

View File

@@ -3,7 +3,7 @@
<section id="introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>Youve already learned the basics of missing values earlier in the book. You first saw them in <a href="#sec-summarize" data-type="xref">#sec-summarize</a> where they interfered with computing summary statistics, and you learned about their infectious nature and how to check for their presence in <a href="#sec-na-comparison" data-type="xref">#sec-na-comparison</a>. Now well come back to them in more depth, so you can learn more of the details.</p>
<p>Youve already learned the basics of missing values earlier in the book. You first saw them in <a href="#chp-data-visualize" data-type="xref">#chp-data-visualize</a> where they resulted in a warning when making a plot as well as in <a href="#sec-summarize" data-type="xref">#sec-summarize</a> where they interfered with computing summary statistics, and you learned about their infectious nature and how to check for their presence in <a href="#sec-na-comparison" data-type="xref">#sec-na-comparison</a>. Now well come back to them in more depth, so you can learn more of the details.</p>
<p>Well start by discussing some general tools for working with missing values recorded as <code>NA</code>s. Well then explore the idea of implicitly missing values, values are that are simply absent from your data, and show some tools you can use to make them explicit. Well finish off with a related discussion of empty groups, caused by factor levels that dont appear in the data.</p>
<section id="prerequisites" data-type="sect2">
@@ -247,11 +247,11 @@ Factors and empty groups</h1>
</div>
<p>The same principle applies to ggplot2s discrete axes, which will also drop levels that dont have any values. You can force them to display by supplying <code>drop = FALSE</code> to the appropriate discrete axis:</p>
<div>
<pre data-type="programlisting" data-code-language="r">ggplot(health, aes(smoker)) +
<pre data-type="programlisting" data-code-language="r">ggplot(health, aes(x = smoker)) +
geom_bar() +
scale_x_discrete()
ggplot(health, aes(smoker)) +
ggplot(health, aes(x = smoker)) +
geom_bar() +
scale_x_discrete(drop = FALSE)</pre>
<div class="cell quarto-layout-panel">
@@ -269,16 +269,16 @@ ggplot(health, aes(smoker)) +
<div class="cell">
<pre data-type="programlisting" data-code-language="r">health |&gt;
group_by(smoker, .drop = FALSE) |&gt;
summarise(
summarize(
n = n(),
mean_age = mean(age),
min_age = min(age),
max_age = max(age),
sd_age = sd(age)
)
#&gt; Warning: There were 2 warnings in `summarise()`.
#&gt; Warning: There were 2 warnings in `summarize()`.
#&gt; The first warning was:
#&gt; In argument `min_age = min(age)`.
#&gt; In argument: `min_age = min(age)`.
#&gt; In group 1: `smoker = yes`.
#&gt; Caused by warning in `min()`:
#&gt; ! no non-missing arguments to min; returning Inf
@@ -306,7 +306,7 @@ length(x2)
<div class="cell">
<pre data-type="programlisting" data-code-language="r">health |&gt;
group_by(smoker) |&gt;
summarise(
summarize(
n = n(),
mean_age = mean(age),
min_age = min(age),