More minor page count tweaks & fixes
And re-convert with latest htmlbook
This commit is contained in:
@@ -1,12 +1,12 @@
|
||||
<section data-type="chapter" id="chp-missing-values">
|
||||
<h1><span id="sec-missing-values" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Missing values</span></span></h1>
|
||||
<section id="introduction" data-type="sect1">
|
||||
<section id="missing-values-introduction" data-type="sect1">
|
||||
<h1>
|
||||
Introduction</h1>
|
||||
<p>You’ve already learned the basics of missing values earlier in the book. You first saw them in <a href="#chp-data-visualize" data-type="xref">#chp-data-visualize</a> where they resulted in a warning when making a plot as well as in <a href="#sec-summarize" data-type="xref">#sec-summarize</a> where they interfered with computing summary statistics, and you learned about their infectious nature and how to check for their presence in <a href="#sec-na-comparison" data-type="xref">#sec-na-comparison</a>. Now we’ll come back to them in more depth, so you can learn more of the details.</p>
|
||||
<p>We’ll start by discussing some general tools for working with missing values recorded as <code>NA</code>s. We’ll then explore the idea of implicitly missing values, values are that are simply absent from your data, and show some tools you can use to make them explicit. We’ll finish off with a related discussion of empty groups, caused by factor levels that don’t appear in the data.</p>
|
||||
|
||||
<section id="prerequisites" data-type="sect2">
|
||||
<section id="missing-values-prerequisites" data-type="sect2">
|
||||
<h2>
|
||||
Prerequisites</h2>
|
||||
<p>The functions for working with missing data mostly come from dplyr and tidyr, which are core members of the tidyverse.</p>
|
||||
@@ -173,11 +173,11 @@ Complete</h2>
|
||||
<p>In some cases, the complete set of observations can’t be generated by a simple combination of variables. In that case, you can do manually what <code><a href="https://tidyr.tidyverse.org/reference/complete.html">complete()</a></code> does for you: create a data frame that contains all the rows that should exist (using whatever combination of techniques you need), then combine it with your original dataset with <code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">dplyr::full_join()</a></code>.</p>
|
||||
</section>
|
||||
|
||||
<section id="joins" data-type="sect2">
|
||||
<section id="missing-values-joins" data-type="sect2">
|
||||
<h2>
|
||||
Joins</h2>
|
||||
<p>This brings us to another important way of revealing implicitly missing observations: joins. You’ll learn more about joins in <a href="#chp-joins" data-type="xref">#chp-joins</a>, but we wanted to quickly mention them to you here since you can often only know that values are missing from one dataset when you compare it another.</p>
|
||||
<p><code>dplyr::anti_join(x, y)</code> is a particularly useful tool here because it selects only the rows in <code>x</code> that don’t have a match in <code>y</code>. For example, we can use two <code><a href="https://dplyr.tidyverse.org/reference/filter-joins.html">anti_join()</a></code>s reveal to reveal that we’re missing information for four airports and 722 planes mentioned in <code>flights</code>:</p>
|
||||
<p><code>dplyr::anti_join(x, y)</code> is a particularly useful tool here because it selects only the rows in <code>x</code> that don’t have a match in <code>y</code>. For example, we can use two <code><a href="https://dplyr.tidyverse.org/reference/filter-joins.html">anti_join()</a></code>s to reveal that we’re missing information for four airports and 722 planes mentioned in <code>flights</code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">library(nycflights13)
|
||||
|
||||
@@ -210,7 +210,7 @@ flights |>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="exercises" data-type="sect2">
|
||||
<section id="missing-values-exercises" data-type="sect2">
|
||||
<h2>
|
||||
Exercises</h2>
|
||||
<ol type="1"><li>Can you find any relationship between the carrier and the rows that appear to be missing from <code>planes</code>?</li>
|
||||
@@ -323,7 +323,7 @@ length(x2)
|
||||
<p>The main drawback of this approach is that you get an <code>NA</code> for the count, even though you know that it should be zero.</p>
|
||||
</section>
|
||||
|
||||
<section id="summary" data-type="sect1">
|
||||
<section id="missing-values-summary" data-type="sect1">
|
||||
<h1>
|
||||
Summary</h1>
|
||||
<p>Missing values are weird! Sometimes they’re recorded as an explicit <code>NA</code> but other times you only notice them by their absence. This chapter has given you some tools for working with explicit missing values, tools for uncovering implicit missing values, and discussed some of the ways that implicit can become explicit and vice versa.</p>
|
||||
|
||||
Reference in New Issue
Block a user