More work on O'Reilly book

* Make width narrower * Convert deps to table * Strip chapter status
2022-11-18 11:05:00 -06:00
parent 5895db09cd
commit 69b4597f3b
33 changed files with 784 additions and 1048 deletions
--- a/oreilly/data-import.html
+++ b/oreilly/data-import.html
@@ -1,13 +1,5 @@
 <section data-type="chapter" id="chp-data-import">
-<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><div data-type="note"><div class="callout-body d-flex">
-<div class="callout-icon-container">
-<i class="callout-icon"/>
-</div>
-
-</div>
-
-<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
-
+<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
 <section id="introduction" data-type="sect1">
 <h1>
 Introduction</h1>
@@ -83,7 +75,7 @@ Reading data from a file</h1>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="downlit">students &lt;- read_csv("data/students.csv")
 #&gt; Rows: 6 Columns: 5
-#&gt; ── Column specification ────────────────────────────────────────────────────────
+#&gt; ── Column specification ─────────────────────────────────────────────────────
 #&gt; Delimiter: ","
 #&gt; chr (4): Full Name, favourite.food, mealPlan, AGE
 #&gt; dbl (1): Student ID
@@ -324,7 +316,7 @@ Guessing types</h2>
  T,Inf,2021-02-16,ghi"
 )
 #&gt; Rows: 3 Columns: 4
-#&gt; ── Column specification ────────────────────────────────────────────────────────
+#&gt; ── Column specification ─────────────────────────────────────────────────────
 #&gt; Delimiter: ","
 #&gt; chr  (1): string
 #&gt; dbl  (1): numeric
@@ -360,7 +352,7 @@ Missing values, column types, and problems</h2>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="downlit">df &lt;- read_csv(csv)
 #&gt; Rows: 4 Columns: 1
-#&gt; ── Column specification ────────────────────────────────────────────────────────
+#&gt; ── Column specification ─────────────────────────────────────────────────────
 #&gt; Delimiter: ","
 #&gt; chr (1): x
 #&gt; 
@@ -370,8 +362,8 @@ Missing values, column types, and problems</h2>
 <p>In this very small case, you can easily see the missing value <code>.</code>. But what happens if you have thousands of rows with only a few missing values represented by <code>.</code>s speckled amongst them? One approach is to tell readr that <code>x</code> is a numeric column, and then see where it fails. You can do that with the <code>col_types</code> argument, which takes a named list:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="downlit">df &lt;- read_csv(csv, col_types = list(x = col_double()))
-#&gt; Warning: One or more parsing issues, call `problems()` on your data frame for details,
-#&gt; e.g.:
+#&gt; Warning: One or more parsing issues, call `problems()` on your data frame for
+#&gt; details, e.g.:
 #&gt;   dat &lt;- vroom(...)
 #&gt;   problems(dat)</pre>
 </div>
@@ -381,13 +373,13 @@ Missing values, column types, and problems</h2>
 #&gt; # A tibble: 1 × 5
 #&gt;     row   col expected actual file                                    
 #&gt;   &lt;int&gt; &lt;int&gt; &lt;chr&gt;    &lt;chr&gt;  &lt;chr&gt;                                   
-#&gt; 1     3     1 a double .      /private/tmp/Rtmp43JYhG/file7cf337a06034</pre>
+#&gt; 1     3     1 a double .      /private/tmp/Rtmpc2nAIe/file8f2f488fc2f4</pre>
 </div>
 <p>This tells us that there was a problem in row 3, col 1 where readr expected a double but got a <code>.</code>. That suggests this dataset uses <code>.</code> for missing values. So then we set <code>na = "."</code>, the automatic guessing succeeds, giving us the numeric column that we want:</p>
 <div class="cell">
 <pre data-type="programlisting" data-code-language="downlit">df &lt;- read_csv(csv, na = ".")
 #&gt; Rows: 4 Columns: 1
-#&gt; ── Column specification ────────────────────────────────────────────────────────
+#&gt; ── Column specification ─────────────────────────────────────────────────────
 #&gt; Delimiter: ","
 #&gt; dbl (1): x
 #&gt; 
@@ -447,7 +439,7 @@ Reading data from multiple files</h1>
 <pre data-type="programlisting" data-code-language="downlit">sales_files &lt;- c("data/01-sales.csv", "data/02-sales.csv", "data/03-sales.csv")
 read_csv(sales_files, id = "file")
 #&gt; Rows: 19 Columns: 6
-#&gt; ── Column specification ────────────────────────────────────────────────────────
+#&gt; ── Column specification ─────────────────────────────────────────────────────
 #&gt; Delimiter: ","
 #&gt; chr (1): month
 #&gt; dbl (4): year, brand, item, n