More minor page count tweaks & fixes

And re-convert with latest htmlbook
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions
--- a/oreilly/numbers.html
+++ b/oreilly/numbers.html
@@ -1,22 +1,24 @@
 <section data-type="chapter" id="chp-numbers">
 <h1><span id="sec-numbers" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Numbers</span></span></h1>
-<section id="introduction" data-type="sect1">
+<section id="numbers-introduction" data-type="sect1">
 <h1>
 Introduction</h1>
 <p>Numeric vectors are the backbone of data science, and you’ve already used them a bunch of times earlier in the book. Now it’s time to systematically survey what you can do with them in R, ensuring that you’re well situated to tackle any future problem involving numeric vectors.</p>
 <p>We’ll start by giving you a couple of tools to make numbers if you have strings, and then going into a little more detail of <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code>. Then we’ll dive into various numeric transformations that pair well with <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>, including more general transformations that can be applied to other types of vector, but are often used with numeric vectors. We’ll finish off by covering the summary functions that pair well with <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> and show you how they can also be used with <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>.</p>

-<section id="prerequisites" data-type="sect2">
+<section id="numbers-prerequisites" data-type="sect2">
 <h2>
 Prerequisites</h2>
-<div data-type="important"><div class="callout-body d-flex">
+<div data-type="important">
+<div class="callout-body d-flex">
 <div class="callout-icon-container">
 <i class="callout-icon"/>
 </div>

-</div>
+<p>This chapter relies on features only found in dplyr 1.1.0, which is still in development. If you want to live on the edge, you can get the dev versions with <code>devtools::install_github("tidyverse/dplyr")</code>.</p>

-<p>This chapter relies on features only found in dplyr 1.1.0, which is still in development. If you want to live on the edge, you can get the dev versions with <code>devtools::install_github("tidyverse/dplyr")</code>.</p></div>
+</div>
+</div>

 <p>This chapter mostly uses functions from base R, which are available without loading any packages. But we still need the tidyverse because we’ll use these base R functions inside of tidyverse functions like <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>. Like in the last chapter, we’ll use real examples from nycflights13, as well as toy examples made with <code><a href="https://rdrr.io/r/base/c.html">c()</a></code> and <code><a href="https://tibble.tidyverse.org/reference/tribble.html">tribble()</a></code>.</p>
 <div class="cell">
@@ -109,9 +111,7 @@ Counts</h1>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">flights |&gt; 
  group_by(dest) |&gt; 
-  summarize(
-    carriers = n_distinct(carrier)
-  ) |&gt; 
+  summarize(carriers = n_distinct(carrier)) |&gt; 
  arrange(desc(carriers))
 #&gt; # A tibble: 105 × 2
 #&gt;   dest  carriers
@@ -144,17 +144,7 @@ Counts</h1>
 </div>
 <p>Weighted counts are a common problem so <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code> has a <code>wt</code> argument that does the same thing:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="r">flights |&gt; count(tailnum, wt = distance)
-#&gt; # A tibble: 4,044 × 2
-#&gt;   tailnum      n
-#&gt;   &lt;chr&gt;    &lt;dbl&gt;
-#&gt; 1 D942DN    3418
-#&gt; 2 N0EGMQ  250866
-#&gt; 3 N10156  115966
-#&gt; 4 N102UW   25722
-#&gt; 5 N103US   24619
-#&gt; 6 N104UW   25157
-#&gt; # … with 4,038 more rows</pre>
+<pre data-type="programlisting" data-code-language="r">flights |&gt; count(tailnum, wt = distance)</pre>
 </div>
 </li>
 <li>
@@ -176,7 +166,7 @@ Counts</h1>
 </div>
 </li>
 </ul>
-<section id="exercises" data-type="sect2">
+<section id="numbers-exercises" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li>How can you use <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code> to count the number rows with a missing value for a given variable?</li>
@@ -228,9 +218,7 @@ x * c(1, 2, 3)
 #&gt; 5  2013     1     1      557            600        -3      838            846
 #&gt; 6  2013     1     1      558            600        -2      849            851
 #&gt; # … with 25,971 more rows, and 11 more variables: arr_delay &lt;dbl&gt;,
-#&gt; #   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;,
-#&gt; #   air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;,
-#&gt; #   time_hour &lt;dttm&gt;</pre>
+#&gt; #   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, …</pre>
 </div>
 <p>The code runs without error, but it doesn’t return what you want. Because of the recycling rules it finds flights in odd numbered rows that departed in January and flights in even numbered rows that departed in February. And unfortunately there’s no warning because <code>flights</code> has an even number of rows.</p>
 <p>To protect you from this type of silent failure, most tidyverse functions use a stricter form of recycling that only recycles single values. Unfortunately that doesn’t help here, or in many other cases, because the key computation is performed by the base R function <code>==</code>, not <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>.</p>
@@ -476,7 +464,7 @@ slide_vec(x, sum, .before = 2, .after = 2, .complete = TRUE)
 </div>
 </section>

-<section id="exercises-1" data-type="sect2">
+<section id="numbers-exercises-1" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li><p>Explain in words what each line of the code used to generate <a href="#fig-prop-cancelled" data-type="xref">#fig-prop-cancelled</a> does.</p></li>
@@ -671,7 +659,7 @@ df
 </div>
 </section>

-<section id="exercises-2" data-type="sect2">
+<section id="numbers-exercises-2" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li><p>Find the 10 most delayed flights using a ranking function. How do you want to handle ties? Carefully read the documentation for <code><a href="https://dplyr.tidyverse.org/reference/row_number.html">min_rank()</a></code>.</p></li>
@@ -718,10 +706,8 @@ Center</h2>
    .groups = "drop"
  ) |&gt; 
  ggplot(aes(x = mean, y = median)) + 
-  geom_abline(slope = 1, intercept = 0, color = "white", size = 2) +
-  geom_point()
-#&gt; Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
-#&gt; ℹ Please use `linewidth` instead.</pre>
+  geom_abline(slope = 1, intercept = 0, color = "white", linewidth = 2) +
+  geom_point()</pre>
 <div class="cell-output-display">

 <figure id="fig-mean-vs-median"><p><img src="numbers_files/figure-html/fig-mean-vs-median-1.png" alt="All points fall below a 45° line, meaning that the median delay is always less than the mean delay. Most points are clustered in a dense region of mean [0, 20] and median [0, 5]. As the mean delay increases, the spread of the median also increases. There are two outlying points with mean ~60, median ~50, and mean ~85, median ~55." width="576"/></p>
@@ -875,15 +861,13 @@ Positions</h2>
 #&gt; 5  2013     1     2       42           2359        43      518            442
 #&gt; 6  2013     1     2      458            500        -2      703            650
 #&gt; # … with 1,189 more rows, and 12 more variables: arr_delay &lt;dbl&gt;,
-#&gt; #   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;,
-#&gt; #   air_time &lt;dbl&gt;, distance &lt;dbl&gt;, hour &lt;dbl&gt;, minute &lt;dbl&gt;,
-#&gt; #   time_hour &lt;dttm&gt;, r &lt;int&gt;</pre>
+#&gt; #   carrier &lt;chr&gt;, flight &lt;int&gt;, tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, …</pre>
 </div>
 </section>

 <section id="with-mutate" data-type="sect2">
 <h2>
-With<code>mutate()</code>
+With mutate()
 </h2>
 <p>As the names suggest, the summary functions are typically paired with <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code>. However, because of the recycling rules we discussed in <a href="#sec-recycling" data-type="xref">#sec-recycling</a> they can also be usefully paired with <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>, particularly when you want do some sort of group standardization. For example:</p>
 <ul><li>
@@ -894,7 +878,7 @@ With<code>mutate()</code>
 <code>x / first(x)</code> computes an index based on the first observation.</li>
 </ul></section>

-<section id="exercises-3" data-type="sect2">
+<section id="numbers-exercises-3" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li>
@@ -910,7 +894,7 @@ Exercises</h2>
 </ol></section>
 </section>

-<section id="summary" data-type="sect1">
+<section id="numbers-summary" data-type="sect1">
 <h1>
 Summary</h1>
 <p>You’re already familiar with many tools for working with numbers, and after reading this chapter you now know how to use them in R. You’ve also learned a handful of useful general transformations that are commonly, but not exclusively, applied to numeric vectors like ranks and offsets. Finally, you worked through a number of numeric summaries, and discussed a few of the statistical challenges that you should consider.</p>