More work on O'Reilly book
* Make width narrower * Convert deps to table * Strip chapter status
This commit is contained in:
@@ -1,13 +1,5 @@
|
||||
<section data-type="chapter" id="chp-data-import">
|
||||
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><div data-type="note"><div class="callout-body d-flex">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"/>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
|
||||
|
||||
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
|
||||
<section id="introduction" data-type="sect1">
|
||||
<h1>
|
||||
Introduction</h1>
|
||||
@@ -83,7 +75,7 @@ Reading data from a file</h1>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">students <- read_csv("data/students.csv")
|
||||
#> Rows: 6 Columns: 5
|
||||
#> ── Column specification ────────────────────────────────────────────────────────
|
||||
#> ── Column specification ─────────────────────────────────────────────────────
|
||||
#> Delimiter: ","
|
||||
#> chr (4): Full Name, favourite.food, mealPlan, AGE
|
||||
#> dbl (1): Student ID
|
||||
@@ -324,7 +316,7 @@ Guessing types</h2>
|
||||
T,Inf,2021-02-16,ghi"
|
||||
)
|
||||
#> Rows: 3 Columns: 4
|
||||
#> ── Column specification ────────────────────────────────────────────────────────
|
||||
#> ── Column specification ─────────────────────────────────────────────────────
|
||||
#> Delimiter: ","
|
||||
#> chr (1): string
|
||||
#> dbl (1): numeric
|
||||
@@ -360,7 +352,7 @@ Missing values, column types, and problems</h2>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">df <- read_csv(csv)
|
||||
#> Rows: 4 Columns: 1
|
||||
#> ── Column specification ────────────────────────────────────────────────────────
|
||||
#> ── Column specification ─────────────────────────────────────────────────────
|
||||
#> Delimiter: ","
|
||||
#> chr (1): x
|
||||
#>
|
||||
@@ -370,8 +362,8 @@ Missing values, column types, and problems</h2>
|
||||
<p>In this very small case, you can easily see the missing value <code>.</code>. But what happens if you have thousands of rows with only a few missing values represented by <code>.</code>s speckled amongst them? One approach is to tell readr that <code>x</code> is a numeric column, and then see where it fails. You can do that with the <code>col_types</code> argument, which takes a named list:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">df <- read_csv(csv, col_types = list(x = col_double()))
|
||||
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
|
||||
#> e.g.:
|
||||
#> Warning: One or more parsing issues, call `problems()` on your data frame for
|
||||
#> details, e.g.:
|
||||
#> dat <- vroom(...)
|
||||
#> problems(dat)</pre>
|
||||
</div>
|
||||
@@ -381,13 +373,13 @@ Missing values, column types, and problems</h2>
|
||||
#> # A tibble: 1 × 5
|
||||
#> row col expected actual file
|
||||
#> <int> <int> <chr> <chr> <chr>
|
||||
#> 1 3 1 a double . /private/tmp/Rtmp43JYhG/file7cf337a06034</pre>
|
||||
#> 1 3 1 a double . /private/tmp/Rtmpc2nAIe/file8f2f488fc2f4</pre>
|
||||
</div>
|
||||
<p>This tells us that there was a problem in row 3, col 1 where readr expected a double but got a <code>.</code>. That suggests this dataset uses <code>.</code> for missing values. So then we set <code>na = "."</code>, the automatic guessing succeeds, giving us the numeric column that we want:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">df <- read_csv(csv, na = ".")
|
||||
#> Rows: 4 Columns: 1
|
||||
#> ── Column specification ────────────────────────────────────────────────────────
|
||||
#> ── Column specification ─────────────────────────────────────────────────────
|
||||
#> Delimiter: ","
|
||||
#> dbl (1): x
|
||||
#>
|
||||
@@ -447,7 +439,7 @@ Reading data from multiple files</h1>
|
||||
<pre data-type="programlisting" data-code-language="downlit">sales_files <- c("data/01-sales.csv", "data/02-sales.csv", "data/03-sales.csv")
|
||||
read_csv(sales_files, id = "file")
|
||||
#> Rows: 19 Columns: 6
|
||||
#> ── Column specification ────────────────────────────────────────────────────────
|
||||
#> ── Column specification ─────────────────────────────────────────────────────
|
||||
#> Delimiter: ","
|
||||
#> chr (1): month
|
||||
#> dbl (4): year, brand, item, n
|
||||
|
||||
Reference in New Issue
Block a user