More work on O'Reilly book
* Make width narrower * Convert deps to table * Strip chapter status
This commit is contained in:
		@@ -1,13 +1,5 @@
 | 
			
		||||
<section data-type="chapter" id="chp-spreadsheets">
 | 
			
		||||
<h1><span id="sec-import-spreadsheets" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Spreadsheets</span></span></h1><div data-type="important"><div class="callout-body d-flex">
 | 
			
		||||
<div class="callout-icon-container">
 | 
			
		||||
<i class="callout-icon"/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
<p>You are reading the work-in-progress second edition of R for Data Science. This chapter is currently a dumping ground for ideas, and we don’t recommend reading it. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>.</p></div>
 | 
			
		||||
 | 
			
		||||
<h1><span id="sec-import-spreadsheets" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Spreadsheets</span></span></h1><p>::: status callout-important You are reading the work-in-progress second edition of R for Data Science. This chapter is currently a dumping ground for ideas, and we don’t recommend reading it. You can find the complete first edition at <a href="https://r4ds.had.co.nz" class="uri">https://r4ds.had.co.nz</a>. :::</p>
 | 
			
		||||
<section id="introduction" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Introduction</h1>
 | 
			
		||||
@@ -197,16 +189,16 @@ Reading individual sheets</h2>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel("data/penguins.xlsx", sheet = "Torgersen Island")
 | 
			
		||||
#> # A tibble: 52 × 8
 | 
			
		||||
#>   species island    bill_length_mm     bill_depth_mm flipp…¹ body_…² sex    year
 | 
			
		||||
#>   <chr>   <chr>     <chr>              <chr>         <chr>   <chr>   <chr> <dbl>
 | 
			
		||||
#> 1 Adelie  Torgersen 39.1               18.7          181     3750    male   2007
 | 
			
		||||
#> 2 Adelie  Torgersen 39.5               17.399999999… 186     3800    fema…  2007
 | 
			
		||||
#> 3 Adelie  Torgersen 40.299999999999997 18            195     3250    fema…  2007
 | 
			
		||||
#> 4 Adelie  Torgersen NA                 NA            NA      NA      NA     2007
 | 
			
		||||
#> 5 Adelie  Torgersen 36.700000000000003 19.3          193     3450    fema…  2007
 | 
			
		||||
#> 6 Adelie  Torgersen 39.299999999999997 20.6          190     3650    male   2007
 | 
			
		||||
#> # … with 46 more rows, and abbreviated variable names ¹flipper_length_mm,
 | 
			
		||||
#> #   ²body_mass_g</pre>
 | 
			
		||||
#>   species island    bill_length_mm     bill_dep…¹ flipp…² body_…³ sex    year
 | 
			
		||||
#>   <chr>   <chr>     <chr>              <chr>      <chr>   <chr>   <chr> <dbl>
 | 
			
		||||
#> 1 Adelie  Torgersen 39.1               18.7       181     3750    male   2007
 | 
			
		||||
#> 2 Adelie  Torgersen 39.5               17.399999… 186     3800    fema…  2007
 | 
			
		||||
#> 3 Adelie  Torgersen 40.299999999999997 18         195     3250    fema…  2007
 | 
			
		||||
#> 4 Adelie  Torgersen NA                 NA         NA      NA      NA     2007
 | 
			
		||||
#> 5 Adelie  Torgersen 36.700000000000003 19.3       193     3450    fema…  2007
 | 
			
		||||
#> 6 Adelie  Torgersen 39.299999999999997 20.6       190     3650    male   2007
 | 
			
		||||
#> # … with 46 more rows, and abbreviated variable names ¹bill_depth_mm,
 | 
			
		||||
#> #   ²flipper_length_mm, ³body_mass_g</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Some variables that appear to contain numerical data are read in as characters due to the character string <code>"NA"</code> not being recognized as a true <code>NA</code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -214,14 +206,14 @@ Reading individual sheets</h2>
 | 
			
		||||
 | 
			
		||||
penguins_torgersen
 | 
			
		||||
#> # A tibble: 52 × 8
 | 
			
		||||
#>   species island    bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex    year
 | 
			
		||||
#>   <chr>   <chr>              <dbl>         <dbl>       <dbl>   <dbl> <chr> <dbl>
 | 
			
		||||
#> 1 Adelie  Torgersen           39.1          18.7         181    3750 male   2007
 | 
			
		||||
#> 2 Adelie  Torgersen           39.5          17.4         186    3800 fema…  2007
 | 
			
		||||
#> 3 Adelie  Torgersen           40.3          18           195    3250 fema…  2007
 | 
			
		||||
#> 4 Adelie  Torgersen           NA            NA            NA      NA <NA>   2007
 | 
			
		||||
#> 5 Adelie  Torgersen           36.7          19.3         193    3450 fema…  2007
 | 
			
		||||
#> 6 Adelie  Torgersen           39.3          20.6         190    3650 male   2007
 | 
			
		||||
#>   species island    bill_length_mm bill_depth_mm flippe…¹ body_…² sex    year
 | 
			
		||||
#>   <chr>   <chr>              <dbl>         <dbl>    <dbl>   <dbl> <chr> <dbl>
 | 
			
		||||
#> 1 Adelie  Torgersen           39.1          18.7      181    3750 male   2007
 | 
			
		||||
#> 2 Adelie  Torgersen           39.5          17.4      186    3800 fema…  2007
 | 
			
		||||
#> 3 Adelie  Torgersen           40.3          18        195    3250 fema…  2007
 | 
			
		||||
#> 4 Adelie  Torgersen           NA            NA         NA      NA <NA>   2007
 | 
			
		||||
#> 5 Adelie  Torgersen           36.7          19.3      193    3450 fema…  2007
 | 
			
		||||
#> 6 Adelie  Torgersen           39.3          20.6      190    3650 male   2007
 | 
			
		||||
#> # … with 46 more rows, and abbreviated variable names ¹flipper_length_mm,
 | 
			
		||||
#> #   ²body_mass_g</pre>
 | 
			
		||||
</div>
 | 
			
		||||
@@ -249,14 +241,14 @@ dim(penguins_dream)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">penguins <- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)
 | 
			
		||||
penguins
 | 
			
		||||
#> # A tibble: 344 × 8
 | 
			
		||||
#>   species island    bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex    year
 | 
			
		||||
#>   <chr>   <chr>              <dbl>         <dbl>       <dbl>   <dbl> <chr> <dbl>
 | 
			
		||||
#> 1 Adelie  Torgersen           39.1          18.7         181    3750 male   2007
 | 
			
		||||
#> 2 Adelie  Torgersen           39.5          17.4         186    3800 fema…  2007
 | 
			
		||||
#> 3 Adelie  Torgersen           40.3          18           195    3250 fema…  2007
 | 
			
		||||
#> 4 Adelie  Torgersen           NA            NA            NA      NA <NA>   2007
 | 
			
		||||
#> 5 Adelie  Torgersen           36.7          19.3         193    3450 fema…  2007
 | 
			
		||||
#> 6 Adelie  Torgersen           39.3          20.6         190    3650 male   2007
 | 
			
		||||
#>   species island    bill_length_mm bill_depth_mm flippe…¹ body_…² sex    year
 | 
			
		||||
#>   <chr>   <chr>              <dbl>         <dbl>    <dbl>   <dbl> <chr> <dbl>
 | 
			
		||||
#> 1 Adelie  Torgersen           39.1          18.7      181    3750 male   2007
 | 
			
		||||
#> 2 Adelie  Torgersen           39.5          17.4      186    3800 fema…  2007
 | 
			
		||||
#> 3 Adelie  Torgersen           40.3          18        195    3250 fema…  2007
 | 
			
		||||
#> 4 Adelie  Torgersen           NA            NA         NA      NA <NA>   2007
 | 
			
		||||
#> 5 Adelie  Torgersen           36.7          19.3      193    3450 fema…  2007
 | 
			
		||||
#> 6 Adelie  Torgersen           39.3          20.6      190    3650 male   2007
 | 
			
		||||
#> # … with 338 more rows, and abbreviated variable names ¹flipper_length_mm,
 | 
			
		||||
#> #   ²body_mass_g</pre>
 | 
			
		||||
</div>
 | 
			
		||||
@@ -287,14 +279,14 @@ deaths <- read_excel(deaths_path)
 | 
			
		||||
#> • `` -> `...6`
 | 
			
		||||
deaths
 | 
			
		||||
#> # A tibble: 18 × 6
 | 
			
		||||
#>   `Lots of people`             ...2       ...3  ...4     ...5          ...6     
 | 
			
		||||
#>   <chr>                        <chr>      <chr> <chr>    <chr>         <chr>    
 | 
			
		||||
#> 1 simply cannot resist writing <NA>       <NA>  <NA>     <NA>          some not…
 | 
			
		||||
#> 2 at                           the        top   <NA>     of            their sp…
 | 
			
		||||
#> 3 or                           merging    <NA>  <NA>     <NA>          cells    
 | 
			
		||||
#> 4 Name                         Profession Age   Has kids Date of birth Date of …
 | 
			
		||||
#> 5 David Bowie                  musician   69    TRUE     17175         42379    
 | 
			
		||||
#> 6 Carrie Fisher                actor      60    TRUE     20749         42731    
 | 
			
		||||
#>   `Lots of people`             ...2       ...3  ...4     ...5          ...6  
 | 
			
		||||
#>   <chr>                        <chr>      <chr> <chr>    <chr>         <chr> 
 | 
			
		||||
#> 1 simply cannot resist writing <NA>       <NA>  <NA>     <NA>          some …
 | 
			
		||||
#> 2 at                           the        top   <NA>     of            their…
 | 
			
		||||
#> 3 or                           merging    <NA>  <NA>     <NA>          cells 
 | 
			
		||||
#> 4 Name                         Profession Age   Has kids Date of birth Date …
 | 
			
		||||
#> 5 David Bowie                  musician   69    TRUE     17175         42379 
 | 
			
		||||
#> 6 Carrie Fisher                actor      60    TRUE     20749         42731 
 | 
			
		||||
#> # … with 12 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>The top three rows and the bottom four rows are not part of the data frame.</p>
 | 
			
		||||
@@ -302,29 +294,30 @@ deaths
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4)
 | 
			
		||||
#> # A tibble: 14 × 6
 | 
			
		||||
#>   Name          Profession Age   `Has kids` `Date of birth`     `Date of death`
 | 
			
		||||
#>   <chr>         <chr>      <chr> <chr>      <dttm>              <chr>          
 | 
			
		||||
#> 1 David Bowie   musician   69    TRUE       1947-01-08 00:00:00 42379          
 | 
			
		||||
#> 2 Carrie Fisher actor      60    TRUE       1956-10-21 00:00:00 42731          
 | 
			
		||||
#> 3 Chuck Berry   musician   90    TRUE       1926-10-18 00:00:00 42812          
 | 
			
		||||
#> 4 Bill Paxton   actor      61    TRUE       1955-05-17 00:00:00 42791          
 | 
			
		||||
#> 5 Prince        musician   57    TRUE       1958-06-07 00:00:00 42481          
 | 
			
		||||
#> 6 Alan Rickman  actor      69    FALSE      1946-02-21 00:00:00 42383          
 | 
			
		||||
#> # … with 8 more rows</pre>
 | 
			
		||||
#>   Name          Profession Age   `Has kids` `Date of birth`     Date of dea…¹
 | 
			
		||||
#>   <chr>         <chr>      <chr> <chr>      <dttm>              <chr>        
 | 
			
		||||
#> 1 David Bowie   musician   69    TRUE       1947-01-08 00:00:00 42379        
 | 
			
		||||
#> 2 Carrie Fisher actor      60    TRUE       1956-10-21 00:00:00 42731        
 | 
			
		||||
#> 3 Chuck Berry   musician   90    TRUE       1926-10-18 00:00:00 42812        
 | 
			
		||||
#> 4 Bill Paxton   actor      61    TRUE       1955-05-17 00:00:00 42791        
 | 
			
		||||
#> 5 Prince        musician   57    TRUE       1958-06-07 00:00:00 42481        
 | 
			
		||||
#> 6 Alan Rickman  actor      69    FALSE      1946-02-21 00:00:00 42383        
 | 
			
		||||
#> # … with 8 more rows, and abbreviated variable name ¹`Date of death`</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>We could also set <code>n_max</code> to omit the extraneous rows at the bottom.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4, n_max = 10)
 | 
			
		||||
#> # A tibble: 10 × 6
 | 
			
		||||
#>   Name          Profession   Age Has k…¹ `Date of birth`     `Date of death`    
 | 
			
		||||
#>   <chr>         <chr>      <dbl> <lgl>   <dttm>              <dttm>             
 | 
			
		||||
#> 1 David Bowie   musician      69 TRUE    1947-01-08 00:00:00 2016-01-10 00:00:00
 | 
			
		||||
#> 2 Carrie Fisher actor         60 TRUE    1956-10-21 00:00:00 2016-12-27 00:00:00
 | 
			
		||||
#> 3 Chuck Berry   musician      90 TRUE    1926-10-18 00:00:00 2017-03-18 00:00:00
 | 
			
		||||
#> 4 Bill Paxton   actor         61 TRUE    1955-05-17 00:00:00 2017-02-25 00:00:00
 | 
			
		||||
#> 5 Prince        musician      57 TRUE    1958-06-07 00:00:00 2016-04-21 00:00:00
 | 
			
		||||
#> 6 Alan Rickman  actor         69 FALSE   1946-02-21 00:00:00 2016-01-14 00:00:00
 | 
			
		||||
#> # … with 4 more rows, and abbreviated variable name ¹`Has kids`</pre>
 | 
			
		||||
#>   Name          Profe…¹   Age Has k…² `Date of birth`     `Date of death`    
 | 
			
		||||
#>   <chr>         <chr>   <dbl> <lgl>   <dttm>              <dttm>             
 | 
			
		||||
#> 1 David Bowie   musici…    69 TRUE    1947-01-08 00:00:00 2016-01-10 00:00:00
 | 
			
		||||
#> 2 Carrie Fisher actor      60 TRUE    1956-10-21 00:00:00 2016-12-27 00:00:00
 | 
			
		||||
#> 3 Chuck Berry   musici…    90 TRUE    1926-10-18 00:00:00 2017-03-18 00:00:00
 | 
			
		||||
#> 4 Bill Paxton   actor      61 TRUE    1955-05-17 00:00:00 2017-02-25 00:00:00
 | 
			
		||||
#> 5 Prince        musici…    57 TRUE    1958-06-07 00:00:00 2016-04-21 00:00:00
 | 
			
		||||
#> 6 Alan Rickman  actor      69 FALSE   1946-02-21 00:00:00 2016-01-14 00:00:00
 | 
			
		||||
#> # … with 4 more rows, and abbreviated variable names ¹Profession,
 | 
			
		||||
#> #   ²`Has kids`</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Another approach is using cell ranges. In Excel, the top left cell is <code>A1</code>. As you move across columns to the right, the cell label moves down the alphabet, i.e. <code>B1</code>, <code>C1</code>, etc. And as you move down a column, the number in the cell label increases, i.e. <code>A2</code>, <code>A3</code>, etc.</p>
 | 
			
		||||
<p>The data we want to read in starts in cell <code>A5</code> and ends in cell <code>F15</code>. In spreadsheet notation, this is <code>A5:F15</code>.</p>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user