More work on O'Reilly book

* Make width narrower
* Convert deps to table
* Strip chapter status
This commit is contained in:
Hadley Wickham
2022-11-18 11:05:00 -06:00
parent 5895db09cd
commit 69b4597f3b
33 changed files with 784 additions and 1048 deletions

View File

@@ -1,13 +1,5 @@
<section data-type="chapter" id="chp-spreadsheets">
<h1><span id="sec-import-spreadsheets" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Spreadsheets</span></span></h1><div data-type="important"><div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"/>
</div>
</div>
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter is currently a dumping ground for ideas, and we dont recommend reading it. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
<h1><span id="sec-import-spreadsheets" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Spreadsheets</span></span></h1><p>::: status callout-important You are reading the work-in-progress second edition of R for Data Science. This chapter is currently a dumping ground for ideas, and we dont recommend reading it. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
<section id="introduction" data-type="sect1">
<h1>
Introduction</h1>
@@ -197,16 +189,16 @@ Reading individual sheets</h2>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">read_excel("data/penguins.xlsx", sheet = "Torgersen Island")
#&gt; # A tibble: 52 × 8
#&gt; species island bill_length_mm bill_depth_mm flipp…¹ body_…² sex year
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
#&gt; 2 Adelie Torgersen 39.5 17.399999999… 186 3800 fema… 2007
#&gt; 3 Adelie Torgersen 40.299999999999997 18 195 3250 fema… 2007
#&gt; 4 Adelie Torgersen NA NA NA NA NA 2007
#&gt; 5 Adelie Torgersen 36.700000000000003 19.3 193 3450 fema… 2007
#&gt; 6 Adelie Torgersen 39.299999999999997 20.6 190 3650 male 2007
#&gt; # … with 46 more rows, and abbreviated variable names ¹​flipper_length_mm,
#&gt; # ²body_mass_g</pre>
#&gt; species island bill_length_mm bill_dep…¹ flipp…² body_…³ sex year
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
#&gt; 2 Adelie Torgersen 39.5 17.399999… 186 3800 fema… 2007
#&gt; 3 Adelie Torgersen 40.299999999999997 18 195 3250 fema… 2007
#&gt; 4 Adelie Torgersen NA NA NA NA NA 2007
#&gt; 5 Adelie Torgersen 36.700000000000003 19.3 193 3450 fema… 2007
#&gt; 6 Adelie Torgersen 39.299999999999997 20.6 190 3650 male 2007
#&gt; # … with 46 more rows, and abbreviated variable names ¹​bill_depth_mm,
#&gt; # ²​flipper_length_mm, ³​body_mass_g</pre>
</div>
<p>Some variables that appear to contain numerical data are read in as characters due to the character string <code>"NA"</code> not being recognized as a true <code>NA</code>.</p>
<div class="cell">
@@ -214,14 +206,14 @@ Reading individual sheets</h2>
penguins_torgersen
#&gt; # A tibble: 52 × 8
#&gt; species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
#&gt; 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
#&gt; 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
#&gt; 4 Adelie Torgersen NA NA NA NA &lt;NA&gt; 2007
#&gt; 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
#&gt; 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
#&gt; species island bill_length_mm bill_depth_mm flippe…¹ body_…² sex year
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
#&gt; 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
#&gt; 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
#&gt; 4 Adelie Torgersen NA NA NA NA &lt;NA&gt; 2007
#&gt; 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
#&gt; 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
#&gt; # … with 46 more rows, and abbreviated variable names ¹flipper_length_mm,
#&gt; # ²body_mass_g</pre>
</div>
@@ -249,14 +241,14 @@ dim(penguins_dream)
<pre data-type="programlisting" data-code-language="downlit">penguins &lt;- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)
penguins
#&gt; # A tibble: 344 × 8
#&gt; species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
#&gt; 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
#&gt; 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
#&gt; 4 Adelie Torgersen NA NA NA NA &lt;NA&gt; 2007
#&gt; 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
#&gt; 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
#&gt; species island bill_length_mm bill_depth_mm flippe…¹ body_…² sex year
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
#&gt; 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
#&gt; 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
#&gt; 4 Adelie Torgersen NA NA NA NA &lt;NA&gt; 2007
#&gt; 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
#&gt; 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
#&gt; # … with 338 more rows, and abbreviated variable names ¹flipper_length_mm,
#&gt; # ²body_mass_g</pre>
</div>
@@ -287,14 +279,14 @@ deaths &lt;- read_excel(deaths_path)
#&gt; • `` -&gt; `...6`
deaths
#&gt; # A tibble: 18 × 6
#&gt; `Lots of people` ...2 ...3 ...4 ...5 ...6
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 simply cannot resist writing &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; some not
#&gt; 2 at the top &lt;NA&gt; of their sp
#&gt; 3 or merging &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; cells
#&gt; 4 Name Profession Age Has kids Date of birth Date of
#&gt; 5 David Bowie musician 69 TRUE 17175 42379
#&gt; 6 Carrie Fisher actor 60 TRUE 20749 42731
#&gt; `Lots of people` ...2 ...3 ...4 ...5 ...6
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 simply cannot resist writing &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; some …
#&gt; 2 at the top &lt;NA&gt; of their…
#&gt; 3 or merging &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; cells
#&gt; 4 Name Profession Age Has kids Date of birth Date …
#&gt; 5 David Bowie musician 69 TRUE 17175 42379
#&gt; 6 Carrie Fisher actor 60 TRUE 20749 42731
#&gt; # … with 12 more rows</pre>
</div>
<p>The top three rows and the bottom four rows are not part of the data frame.</p>
@@ -302,29 +294,30 @@ deaths
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4)
#&gt; # A tibble: 14 × 6
#&gt; Name Profession Age `Has kids` `Date of birth` `Date of death`
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dttm&gt; &lt;chr&gt;
#&gt; 1 David Bowie musician 69 TRUE 1947-01-08 00:00:00 42379
#&gt; 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 42731
#&gt; 3 Chuck Berry musician 90 TRUE 1926-10-18 00:00:00 42812
#&gt; 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 42791
#&gt; 5 Prince musician 57 TRUE 1958-06-07 00:00:00 42481
#&gt; 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 42383
#&gt; # … with 8 more rows</pre>
#&gt; Name Profession Age `Has kids` `Date of birth` Date of dea…¹
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dttm&gt; &lt;chr&gt;
#&gt; 1 David Bowie musician 69 TRUE 1947-01-08 00:00:00 42379
#&gt; 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 42731
#&gt; 3 Chuck Berry musician 90 TRUE 1926-10-18 00:00:00 42812
#&gt; 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 42791
#&gt; 5 Prince musician 57 TRUE 1958-06-07 00:00:00 42481
#&gt; 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 42383
#&gt; # … with 8 more rows, and abbreviated variable name ¹​`Date of death`</pre>
</div>
<p>We could also set <code>n_max</code> to omit the extraneous rows at the bottom.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4, n_max = 10)
#&gt; # A tibble: 10 × 6
#&gt; Name Profession Age Has k…¹ `Date of birth` `Date of death`
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;lgl&gt; &lt;dttm&gt; &lt;dttm&gt;
#&gt; 1 David Bowie musician 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
#&gt; 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
#&gt; 3 Chuck Berry musician 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
#&gt; 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
#&gt; 5 Prince musician 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
#&gt; 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
#&gt; # … with 4 more rows, and abbreviated variable name ¹​`Has kids`</pre>
#&gt; Name Profe…¹ Age Has k…² `Date of birth` `Date of death`
#&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;lgl&gt; &lt;dttm&gt; &lt;dttm&gt;
#&gt; 1 David Bowie musici 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
#&gt; 2 Carrie Fisher actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
#&gt; 3 Chuck Berry musici 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
#&gt; 4 Bill Paxton actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
#&gt; 5 Prince musici 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
#&gt; 6 Alan Rickman actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
#&gt; # … with 4 more rows, and abbreviated variable names ¹​Profession,
#&gt; # ²​`Has kids`</pre>
</div>
<p>Another approach is using cell ranges. In Excel, the top left cell is <code>A1</code>. As you move across columns to the right, the cell label moves down the alphabet, i.e. <code>B1</code>, <code>C1</code>, etc. And as you move down a column, the number in the cell label increases, i.e. <code>A2</code>, <code>A3</code>, etc.</p>
<p>The data we want to read in starts in cell <code>A5</code> and ends in cell <code>F15</code>. In spreadsheet notation, this is <code>A5:F15</code>.</p>