More minor page count tweaks & fixes

And re-convert with latest htmlbook
This commit is contained in:
Hadley Wickham
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions

View File

@@ -1,12 +1,12 @@
<section data-type="chapter" id="chp-missing-values">
<h1><span id="sec-missing-values" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Missing values</span></span></h1>
<section id="introduction" data-type="sect1">
<section id="missing-values-introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>Youve already learned the basics of missing values earlier in the book. You first saw them in <a href="#chp-data-visualize" data-type="xref">#chp-data-visualize</a> where they resulted in a warning when making a plot as well as in <a href="#sec-summarize" data-type="xref">#sec-summarize</a> where they interfered with computing summary statistics, and you learned about their infectious nature and how to check for their presence in <a href="#sec-na-comparison" data-type="xref">#sec-na-comparison</a>. Now well come back to them in more depth, so you can learn more of the details.</p>
<p>Well start by discussing some general tools for working with missing values recorded as <code>NA</code>s. Well then explore the idea of implicitly missing values, values are that are simply absent from your data, and show some tools you can use to make them explicit. Well finish off with a related discussion of empty groups, caused by factor levels that dont appear in the data.</p>
<section id="prerequisites" data-type="sect2">
<section id="missing-values-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<p>The functions for working with missing data mostly come from dplyr and tidyr, which are core members of the tidyverse.</p>
@@ -173,11 +173,11 @@ Complete</h2>
<p>In some cases, the complete set of observations cant be generated by a simple combination of variables. In that case, you can do manually what <code><a href="https://tidyr.tidyverse.org/reference/complete.html">complete()</a></code> does for you: create a data frame that contains all the rows that should exist (using whatever combination of techniques you need), then combine it with your original dataset with <code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">dplyr::full_join()</a></code>.</p>
</section>
<section id="joins" data-type="sect2">
<section id="missing-values-joins" data-type="sect2">
<h2>
Joins</h2>
<p>This brings us to another important way of revealing implicitly missing observations: joins. Youll learn more about joins in <a href="#chp-joins" data-type="xref">#chp-joins</a>, but we wanted to quickly mention them to you here since you can often only know that values are missing from one dataset when you compare it another.</p>
<p><code>dplyr::anti_join(x, y)</code> is a particularly useful tool here because it selects only the rows in <code>x</code> that dont have a match in <code>y</code>. For example, we can use two <code><a href="https://dplyr.tidyverse.org/reference/filter-joins.html">anti_join()</a></code>s reveal to reveal that were missing information for four airports and 722 planes mentioned in <code>flights</code>:</p>
<p><code>dplyr::anti_join(x, y)</code> is a particularly useful tool here because it selects only the rows in <code>x</code> that dont have a match in <code>y</code>. For example, we can use two <code><a href="https://dplyr.tidyverse.org/reference/filter-joins.html">anti_join()</a></code>s to reveal that were missing information for four airports and 722 planes mentioned in <code>flights</code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">library(nycflights13)
@@ -210,7 +210,7 @@ flights |&gt;
</div>
</section>
<section id="exercises" data-type="sect2">
<section id="missing-values-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li>Can you find any relationship between the carrier and the rows that appear to be missing from <code>planes</code>?</li>
@@ -323,7 +323,7 @@ length(x2)
<p>The main drawback of this approach is that you get an <code>NA</code> for the count, even though you know that it should be zero.</p>
</section>
<section id="summary" data-type="sect1">
<section id="missing-values-summary" data-type="sect1">
<h1>
Summary</h1>
<p>Missing values are weird! Sometimes theyre recorded as an explicit <code>NA</code> but other times you only notice them by their absence. This chapter has given you some tools for working with explicit missing values, tools for uncovering implicit missing values, and discussed some of the ways that implicit can become explicit and vice versa.</p>