More work on O'Reilly book
* Make width narrower * Convert deps to table * Strip chapter status
This commit is contained in:
		@@ -1,13 +1,5 @@
 | 
			
		||||
<section data-type="chapter" id="chp-data-import">
 | 
			
		||||
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><div data-type="note"><div class="callout-body d-flex">
 | 
			
		||||
<div class="callout-icon-container">
 | 
			
		||||
<i class="callout-icon"/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
 | 
			
		||||
 | 
			
		||||
<h1><span id="sec-data-import" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Data import</span></span></h1><p>::: status callout-note You are reading the work-in-progress second edition of R for Data Science. This chapter should be readable but is currently undergoing final polishing. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
 | 
			
		||||
<section id="introduction" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Introduction</h1>
 | 
			
		||||
@@ -83,7 +75,7 @@ Reading data from a file</h1>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">students <- read_csv("data/students.csv")
 | 
			
		||||
#> Rows: 6 Columns: 5
 | 
			
		||||
#> ── Column specification ────────────────────────────────────────────────────────
 | 
			
		||||
#> ── Column specification ─────────────────────────────────────────────────────
 | 
			
		||||
#> Delimiter: ","
 | 
			
		||||
#> chr (4): Full Name, favourite.food, mealPlan, AGE
 | 
			
		||||
#> dbl (1): Student ID
 | 
			
		||||
@@ -324,7 +316,7 @@ Guessing types</h2>
 | 
			
		||||
  T,Inf,2021-02-16,ghi"
 | 
			
		||||
)
 | 
			
		||||
#> Rows: 3 Columns: 4
 | 
			
		||||
#> ── Column specification ────────────────────────────────────────────────────────
 | 
			
		||||
#> ── Column specification ─────────────────────────────────────────────────────
 | 
			
		||||
#> Delimiter: ","
 | 
			
		||||
#> chr  (1): string
 | 
			
		||||
#> dbl  (1): numeric
 | 
			
		||||
@@ -360,7 +352,7 @@ Missing values, column types, and problems</h2>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">df <- read_csv(csv)
 | 
			
		||||
#> Rows: 4 Columns: 1
 | 
			
		||||
#> ── Column specification ────────────────────────────────────────────────────────
 | 
			
		||||
#> ── Column specification ─────────────────────────────────────────────────────
 | 
			
		||||
#> Delimiter: ","
 | 
			
		||||
#> chr (1): x
 | 
			
		||||
#> 
 | 
			
		||||
@@ -370,8 +362,8 @@ Missing values, column types, and problems</h2>
 | 
			
		||||
<p>In this very small case, you can easily see the missing value <code>.</code>. But what happens if you have thousands of rows with only a few missing values represented by <code>.</code>s speckled amongst them? One approach is to tell readr that <code>x</code> is a numeric column, and then see where it fails. You can do that with the <code>col_types</code> argument, which takes a named list:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">df <- read_csv(csv, col_types = list(x = col_double()))
 | 
			
		||||
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
 | 
			
		||||
#> e.g.:
 | 
			
		||||
#> Warning: One or more parsing issues, call `problems()` on your data frame for
 | 
			
		||||
#> details, e.g.:
 | 
			
		||||
#>   dat <- vroom(...)
 | 
			
		||||
#>   problems(dat)</pre>
 | 
			
		||||
</div>
 | 
			
		||||
@@ -381,13 +373,13 @@ Missing values, column types, and problems</h2>
 | 
			
		||||
#> # A tibble: 1 × 5
 | 
			
		||||
#>     row   col expected actual file                                    
 | 
			
		||||
#>   <int> <int> <chr>    <chr>  <chr>                                   
 | 
			
		||||
#> 1     3     1 a double .      /private/tmp/Rtmp43JYhG/file7cf337a06034</pre>
 | 
			
		||||
#> 1     3     1 a double .      /private/tmp/Rtmpc2nAIe/file8f2f488fc2f4</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>This tells us that there was a problem in row 3, col 1 where readr expected a double but got a <code>.</code>. That suggests this dataset uses <code>.</code> for missing values. So then we set <code>na = "."</code>, the automatic guessing succeeds, giving us the numeric column that we want:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">df <- read_csv(csv, na = ".")
 | 
			
		||||
#> Rows: 4 Columns: 1
 | 
			
		||||
#> ── Column specification ────────────────────────────────────────────────────────
 | 
			
		||||
#> ── Column specification ─────────────────────────────────────────────────────
 | 
			
		||||
#> Delimiter: ","
 | 
			
		||||
#> dbl (1): x
 | 
			
		||||
#> 
 | 
			
		||||
@@ -447,7 +439,7 @@ Reading data from multiple files</h1>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">sales_files <- c("data/01-sales.csv", "data/02-sales.csv", "data/03-sales.csv")
 | 
			
		||||
read_csv(sales_files, id = "file")
 | 
			
		||||
#> Rows: 19 Columns: 6
 | 
			
		||||
#> ── Column specification ────────────────────────────────────────────────────────
 | 
			
		||||
#> ── Column specification ─────────────────────────────────────────────────────
 | 
			
		||||
#> Delimiter: ","
 | 
			
		||||
#> chr (1): month
 | 
			
		||||
#> dbl (4): year, brand, item, n
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user