More minor page count tweaks & fixes

And re-convert with latest htmlbook
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions
--- a/oreilly/data-import.html
+++ b/oreilly/data-import.html
@@ -1,12 +1,12 @@
 <section data-type="chapter" id="chp-data-import">
 <h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1>
-<section id="introduction" data-type="sect1">
+<section id="data-import-introduction" data-type="sect1">
 <h1>
 Introduction</h1>
 <p>Working with data provided by R packages is a great way to learn data science tools, but you want to apply what you’ve learned to your own data at some point. In this chapter, you’ll learn the basics of reading data files into R.</p>
 <p>Specifically, this chapter will focus on reading plain-text rectangular files. We’ll start with practical advice for handling features like column names, types, and missing data. You will then learn about reading data from multiple files at once and writing data from R to a file. Finally, you’ll learn how to handcraft data frames in R.</p>

-<section id="prerequisites" data-type="sect2">
+<section id="data-import-prerequisites" data-type="sect2">
 <h2>
 Prerequisites</h2>
 <p>In this chapter, you’ll learn how to load flat files in R with the <strong>readr</strong> package, which is part of the core tidyverse.</p>
@@ -257,7 +257,7 @@ Other file types</h2>
 <li><p><code><a href="https://readr.tidyverse.org/reference/read_log.html">read_log()</a></code> reads Apache-style log files.</p></li>
 </ul></section>

-<section id="exercises" data-type="sect2">
+<section id="data-import-exercises" data-type="sect2">
 <h2>
 Exercises</h2>
 <ol type="1"><li><p>What function would you use to read a file where fields were separated with “|”?</p></li>
@@ -372,9 +372,9 @@ Missing values, column types, and problems</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="r">problems(df)
 #&gt; # A tibble: 1 × 5
-#&gt;     row   col expected actual file                                     
-#&gt;   &lt;int&gt; &lt;int&gt; &lt;chr&gt;    &lt;chr&gt;  &lt;chr&gt;                                    
-#&gt; 1     3     1 a double .      /private/tmp/Rtmp1nE0XP/file11b88112257a4</pre>
+#&gt;     row   col expected actual file                                    
+#&gt;   &lt;int&gt; &lt;int&gt; &lt;chr&gt;    &lt;chr&gt;  &lt;chr&gt;                                   
+#&gt; 1     3     1 a double .      /private/tmp/Rtmpx37bAU/filec1bb57d587a7</pre>
 </div>
 <p>This tells us that there was a problem in row 3, col 1 where readr expected a double but got a <code>.</code>. That suggests this dataset uses <code>.</code> for missing values. So then we set <code>na = "."</code>, the automatic guessing succeeds, giving us the numeric column that we want:</p>
 <div class="cell">
@@ -584,7 +584,7 @@ Data entry</h1>
 <p>We’ll use <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> and <code><a href="https://tibble.tidyverse.org/reference/tribble.html">tribble()</a></code> later in the book to construct small examples to demonstrate how various functions work.</p>
 </section>

-<section id="summary" data-type="sect1">
+<section id="data-import-summary" data-type="sect1">
 <h1>
 Summary</h1>
 <p>In this chapter, you’ve learned how to load CSV files with <code><a href="https://readr.tidyverse.org/reference/read_delim.html">read_csv()</a></code> and to do your own data entry with <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> and <code><a href="https://tibble.tidyverse.org/reference/tribble.html">tribble()</a></code>. You’ve learned how csv files work, some of the problems you might encounter, and how to overcome them. We’ll come to data import a few times in this book: <a href="#chp-spreadsheets" data-type="xref">#chp-spreadsheets</a> from Excel and googlesheets, <a href="#chp-databases" data-type="xref">#chp-databases</a> will show you how to load data from databases, <a href="#chp-arrow" data-type="xref">#chp-arrow</a> from parquet files, <a href="#chp-rectangling" data-type="xref">#chp-rectangling</a> from JSON, and <a href="#chp-webscraping" data-type="xref">#chp-webscraping</a> from websites.</p>