More minor page count tweaks & fixes

And re-convert with latest htmlbook
This commit is contained in:
Hadley Wickham
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions

View File

@@ -1,12 +1,12 @@
<section data-type="chapter" id="chp-data-import">
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1>
<section id="introduction" data-type="sect1">
<section id="data-import-introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>Working with data provided by R packages is a great way to learn data science tools, but you want to apply what youve learned to your own data at some point. In this chapter, youll learn the basics of reading data files into R.</p>
<p>Specifically, this chapter will focus on reading plain-text rectangular files. Well start with practical advice for handling features like column names, types, and missing data. You will then learn about reading data from multiple files at once and writing data from R to a file. Finally, youll learn how to handcraft data frames in R.</p>
<section id="prerequisites" data-type="sect2">
<section id="data-import-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<p>In this chapter, youll learn how to load flat files in R with the <strong>readr</strong> package, which is part of the core tidyverse.</p>
@@ -257,7 +257,7 @@ Other file types</h2>
<li><p><code><a href="https://readr.tidyverse.org/reference/read_log.html">read_log()</a></code> reads Apache-style log files.</p></li>
</ul></section>
<section id="exercises" data-type="sect2">
<section id="data-import-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>What function would you use to read a file where fields were separated with “|”?</p></li>
@@ -372,9 +372,9 @@ Missing values, column types, and problems</h2>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">problems(df)
#&gt; # A tibble: 1 × 5
#&gt; row col expected actual file
#&gt; &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 3 1 a double . /private/tmp/Rtmp1nE0XP/file11b88112257a4</pre>
#&gt; row col expected actual file
#&gt; &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 3 1 a double . /private/tmp/Rtmpx37bAU/filec1bb57d587a7</pre>
</div>
<p>This tells us that there was a problem in row 3, col 1 where readr expected a double but got a <code>.</code>. That suggests this dataset uses <code>.</code> for missing values. So then we set <code>na = "."</code>, the automatic guessing succeeds, giving us the numeric column that we want:</p>
<div class="cell">
@@ -584,7 +584,7 @@ Data entry</h1>
<p>Well use <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> and <code><a href="https://tibble.tidyverse.org/reference/tribble.html">tribble()</a></code> later in the book to construct small examples to demonstrate how various functions work.</p>
</section>
<section id="summary" data-type="sect1">
<section id="data-import-summary" data-type="sect1">
<h1>
Summary</h1>
<p>In this chapter, youve learned how to load CSV files with <code><a href="https://readr.tidyverse.org/reference/read_delim.html">read_csv()</a></code> and to do your own data entry with <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> and <code><a href="https://tibble.tidyverse.org/reference/tribble.html">tribble()</a></code>. Youve learned how csv files work, some of the problems you might encounter, and how to overcome them. Well come to data import a few times in this book: <a href="#chp-spreadsheets" data-type="xref">#chp-spreadsheets</a> from Excel and googlesheets, <a href="#chp-databases" data-type="xref">#chp-databases</a> will show you how to load data from databases, <a href="#chp-arrow" data-type="xref">#chp-arrow</a> from parquet files, <a href="#chp-rectangling" data-type="xref">#chp-rectangling</a> from JSON, and <a href="#chp-webscraping" data-type="xref">#chp-webscraping</a> from websites.</p>