More work on O'Reilly book

* Make width narrower
* Convert deps to table
* Strip chapter status
This commit is contained in:
Hadley Wickham
2022-11-18 11:05:00 -06:00
parent 5895db09cd
commit 69b4597f3b
33 changed files with 784 additions and 1048 deletions

View File

@@ -1,13 +1,5 @@
<section data-type="chapter" id="chp-data-import">
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><div data-type="note"><div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"/>
</div>
</div>
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
<section id="introduction" data-type="sect1">
<h1>
Introduction</h1>
@@ -83,7 +75,7 @@ Reading data from a file</h1>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">students &lt;- read_csv("data/students.csv")
#&gt; Rows: 6 Columns: 5
#&gt; ── Column specification ────────────────────────────────────────────────────────
#&gt; ── Column specification ─────────────────────────────────────────────────────
#&gt; Delimiter: ","
#&gt; chr (4): Full Name, favourite.food, mealPlan, AGE
#&gt; dbl (1): Student ID
@@ -324,7 +316,7 @@ Guessing types</h2>
T,Inf,2021-02-16,ghi"
)
#&gt; Rows: 3 Columns: 4
#&gt; ── Column specification ────────────────────────────────────────────────────────
#&gt; ── Column specification ─────────────────────────────────────────────────────
#&gt; Delimiter: ","
#&gt; chr (1): string
#&gt; dbl (1): numeric
@@ -360,7 +352,7 @@ Missing values, column types, and problems</h2>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df &lt;- read_csv(csv)
#&gt; Rows: 4 Columns: 1
#&gt; ── Column specification ────────────────────────────────────────────────────────
#&gt; ── Column specification ─────────────────────────────────────────────────────
#&gt; Delimiter: ","
#&gt; chr (1): x
#&gt;
@@ -370,8 +362,8 @@ Missing values, column types, and problems</h2>
<p>In this very small case, you can easily see the missing value <code>.</code>. But what happens if you have thousands of rows with only a few missing values represented by <code>.</code>s speckled amongst them? One approach is to tell readr that <code>x</code> is a numeric column, and then see where it fails. You can do that with the <code>col_types</code> argument, which takes a named list:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df &lt;- read_csv(csv, col_types = list(x = col_double()))
#&gt; Warning: One or more parsing issues, call `problems()` on your data frame for details,
#&gt; e.g.:
#&gt; Warning: One or more parsing issues, call `problems()` on your data frame for
#&gt; details, e.g.:
#&gt; dat &lt;- vroom(...)
#&gt; problems(dat)</pre>
</div>
@@ -381,13 +373,13 @@ Missing values, column types, and problems</h2>
#&gt; # A tibble: 1 × 5
#&gt; row col expected actual file
#&gt; &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 3 1 a double . /private/tmp/Rtmp43JYhG/file7cf337a06034</pre>
#&gt; 1 3 1 a double . /private/tmp/Rtmpc2nAIe/file8f2f488fc2f4</pre>
</div>
<p>This tells us that there was a problem in row 3, col 1 where readr expected a double but got a <code>.</code>. That suggests this dataset uses <code>.</code> for missing values. So then we set <code>na = "."</code>, the automatic guessing succeeds, giving us the numeric column that we want:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df &lt;- read_csv(csv, na = ".")
#&gt; Rows: 4 Columns: 1
#&gt; ── Column specification ────────────────────────────────────────────────────────
#&gt; ── Column specification ─────────────────────────────────────────────────────
#&gt; Delimiter: ","
#&gt; dbl (1): x
#&gt;
@@ -447,7 +439,7 @@ Reading data from multiple files</h1>
<pre data-type="programlisting" data-code-language="downlit">sales_files &lt;- c("data/01-sales.csv", "data/02-sales.csv", "data/03-sales.csv")
read_csv(sales_files, id = "file")
#&gt; Rows: 19 Columns: 6
#&gt; ── Column specification ────────────────────────────────────────────────────────
#&gt; ── Column specification ─────────────────────────────────────────────────────
#&gt; Delimiter: ","
#&gt; chr (1): month
#&gt; dbl (4): year, brand, item, n