More work on O'Reilly book
* Make width narrower * Convert deps to table * Strip chapter status
This commit is contained in:
@@ -1,13 +1,5 @@
|
||||
<section data-type="chapter" id="chp-spreadsheets">
|
||||
<h1><span id="sec-import-spreadsheets" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Spreadsheets</span></span></h1><div data-type="important"><div class="callout-body d-flex">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"/>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter is currently a dumping ground for ideas, and we don’t recommend reading it. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
|
||||
|
||||
<h1><span id="sec-import-spreadsheets" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Spreadsheets</span></span></h1><p>::: status callout-important You are reading the work-in-progress second edition of R for Data Science. This chapter is currently a dumping ground for ideas, and we don’t recommend reading it. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
|
||||
<section id="introduction" data-type="sect1">
|
||||
<h1>
|
||||
Introduction</h1>
|
||||
@@ -197,16 +189,16 @@ Reading individual sheets</h2>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">read_excel("data/penguins.xlsx", sheet = "Torgersen Island")
|
||||
#> # A tibble: 52 × 8
|
||||
#> species island bill_length_mm bill_depth_mm flipp…¹ body_…² sex year
|
||||
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
|
||||
#> 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
|
||||
#> 2 Adelie Torgersen 39.5 17.399999999… 186 3800 fema… 2007
|
||||
#> 3 Adelie Torgersen 40.299999999999997 18 195 3250 fema… 2007
|
||||
#> 4 Adelie Torgersen NA NA NA NA NA 2007
|
||||
#> 5 Adelie Torgersen 36.700000000000003 19.3 193 3450 fema… 2007
|
||||
#> 6 Adelie Torgersen 39.299999999999997 20.6 190 3650 male 2007
|
||||
#> # … with 46 more rows, and abbreviated variable names ¹flipper_length_mm,
|
||||
#> # ²body_mass_g</pre>
|
||||
#> species island bill_length_mm bill_dep…¹ flipp…² body_…³ sex year
|
||||
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
|
||||
#> 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
|
||||
#> 2 Adelie Torgersen 39.5 17.399999… 186 3800 fema… 2007
|
||||
#> 3 Adelie Torgersen 40.299999999999997 18 195 3250 fema… 2007
|
||||
#> 4 Adelie Torgersen NA NA NA NA NA 2007
|
||||
#> 5 Adelie Torgersen 36.700000000000003 19.3 193 3450 fema… 2007
|
||||
#> 6 Adelie Torgersen 39.299999999999997 20.6 190 3650 male 2007
|
||||
#> # … with 46 more rows, and abbreviated variable names ¹bill_depth_mm,
|
||||
#> # ²flipper_length_mm, ³body_mass_g</pre>
|
||||
</div>
|
||||
<p>Some variables that appear to contain numerical data are read in as characters due to the character string <code>"NA"</code> not being recognized as a true <code>NA</code>.</p>
|
||||
<div class="cell">
|
||||
@@ -214,14 +206,14 @@ Reading individual sheets</h2>
|
||||
|
||||
penguins_torgersen
|
||||
#> # A tibble: 52 × 8
|
||||
#> species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
|
||||
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
|
||||
#> 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
|
||||
#> 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
|
||||
#> 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
|
||||
#> 4 Adelie Torgersen NA NA NA NA <NA> 2007
|
||||
#> 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
|
||||
#> 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
|
||||
#> species island bill_length_mm bill_depth_mm flippe…¹ body_…² sex year
|
||||
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
|
||||
#> 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
|
||||
#> 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
|
||||
#> 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
|
||||
#> 4 Adelie Torgersen NA NA NA NA <NA> 2007
|
||||
#> 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
|
||||
#> 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
|
||||
#> # … with 46 more rows, and abbreviated variable names ¹flipper_length_mm,
|
||||
#> # ²body_mass_g</pre>
|
||||
</div>
|
||||
@@ -249,14 +241,14 @@ dim(penguins_dream)
|
||||
<pre data-type="programlisting" data-code-language="downlit">penguins <- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)
|
||||
penguins
|
||||
#> # A tibble: 344 × 8
|
||||
#> species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
|
||||
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
|
||||
#> 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
|
||||
#> 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
|
||||
#> 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
|
||||
#> 4 Adelie Torgersen NA NA NA NA <NA> 2007
|
||||
#> 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
|
||||
#> 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
|
||||
#> species island bill_length_mm bill_depth_mm flippe…¹ body_…² sex year
|
||||
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
|
||||
#> 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
|
||||
#> 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
|
||||
#> 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
|
||||
#> 4 Adelie Torgersen NA NA NA NA <NA> 2007
|
||||
#> 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
|
||||
#> 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
|
||||
#> # … with 338 more rows, and abbreviated variable names ¹flipper_length_mm,
|
||||
#> # ²body_mass_g</pre>
|
||||
</div>
|
||||
@@ -287,14 +279,14 @@ deaths <- read_excel(deaths_path)
|
||||
#> • `` -> `...6`
|
||||
deaths
|
||||
#> # A tibble: 18 × 6
|
||||
#> `Lots of people` ...2 ...3 ...4 ...5 ...6
|
||||
#> <chr> <chr> <chr> <chr> <chr> <chr>
|
||||
#> 1 simply cannot resist writing <NA> <NA> <NA> <NA> some not…
|
||||
#> 2 at the top <NA> of their sp…
|
||||
#> 3 or merging <NA> <NA> <NA> cells
|
||||
#> 4 Name Profession Age Has kids Date of birth Date of …
|
||||
#> 5 David Bowie musician 69 TRUE 17175 42379
|
||||
#> 6 Carrie Fisher actor 60 TRUE 20749 42731
|
||||
#> `Lots of people` ...2 ...3 ...4 ...5 ...6
|
||||
#> <chr> <chr> <chr> <chr> <chr> <chr>
|
||||
#> 1 simply cannot resist writing <NA> <NA> <NA> <NA> some …
|
||||
#> 2 at the top <NA> of their…
|
||||
#> 3 or merging <NA> <NA> <NA> cells
|
||||
#> 4 Name Profession Age Has kids Date of birth Date …
|
||||
#> 5 David Bowie musician 69 TRUE 17175 42379
|
||||
#> 6 Carrie Fisher actor 60 TRUE 20749 42731
|
||||
#> # … with 12 more rows</pre>
|
||||
</div>
|
||||
<p>The top three rows and the bottom four rows are not part of the data frame.</p>
|
||||
@@ -302,29 +294,30 @@ deaths
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4)
|
||||
#> # A tibble: 14 × 6
|
||||
#> Name Profession Age `Has kids` `Date of birth` `Date of death`
|
||||
#> <chr> <chr> <chr> <chr> <dttm> <chr>
|
||||
#> 1 David Bowie musician 69 TRUE 1947-01-08 00:00:00 42379
|
||||
#> 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 42731
|
||||
#> 3 Chuck Berry musician 90 TRUE 1926-10-18 00:00:00 42812
|
||||
#> 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 42791
|
||||
#> 5 Prince musician 57 TRUE 1958-06-07 00:00:00 42481
|
||||
#> 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 42383
|
||||
#> # … with 8 more rows</pre>
|
||||
#> Name Profession Age `Has kids` `Date of birth` Date of dea…¹
|
||||
#> <chr> <chr> <chr> <chr> <dttm> <chr>
|
||||
#> 1 David Bowie musician 69 TRUE 1947-01-08 00:00:00 42379
|
||||
#> 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 42731
|
||||
#> 3 Chuck Berry musician 90 TRUE 1926-10-18 00:00:00 42812
|
||||
#> 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 42791
|
||||
#> 5 Prince musician 57 TRUE 1958-06-07 00:00:00 42481
|
||||
#> 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 42383
|
||||
#> # … with 8 more rows, and abbreviated variable name ¹`Date of death`</pre>
|
||||
</div>
|
||||
<p>We could also set <code>n_max</code> to omit the extraneous rows at the bottom.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4, n_max = 10)
|
||||
#> # A tibble: 10 × 6
|
||||
#> Name Profession Age Has k…¹ `Date of birth` `Date of death`
|
||||
#> <chr> <chr> <dbl> <lgl> <dttm> <dttm>
|
||||
#> 1 David Bowie musician 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
|
||||
#> 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
|
||||
#> 3 Chuck Berry musician 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
|
||||
#> 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
|
||||
#> 5 Prince musician 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
|
||||
#> 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
|
||||
#> # … with 4 more rows, and abbreviated variable name ¹`Has kids`</pre>
|
||||
#> Name Profe…¹ Age Has k…² `Date of birth` `Date of death`
|
||||
#> <chr> <chr> <dbl> <lgl> <dttm> <dttm>
|
||||
#> 1 David Bowie musici… 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
|
||||
#> 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
|
||||
#> 3 Chuck Berry musici… 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
|
||||
#> 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
|
||||
#> 5 Prince musici… 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
|
||||
#> 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
|
||||
#> # … with 4 more rows, and abbreviated variable names ¹Profession,
|
||||
#> # ²`Has kids`</pre>
|
||||
</div>
|
||||
<p>Another approach is using cell ranges. In Excel, the top left cell is <code>A1</code>. As you move across columns to the right, the cell label moves down the alphabet, i.e. <code>B1</code>, <code>C1</code>, etc. And as you move down a column, the number in the cell label increases, i.e. <code>A2</code>, <code>A3</code>, etc.</p>
|
||||
<p>The data we want to read in starts in cell <code>A5</code> and ends in cell <code>F15</code>. In spreadsheet notation, this is <code>A5:F15</code>.</p>
|
||||
|
||||
Reference in New Issue
Block a user