Fix code language
This commit is contained in:
		@@ -16,7 +16,7 @@ Excel</h1>
 | 
			
		||||
Prerequisites</h2>
 | 
			
		||||
<p>In this chapter, you’ll learn how to load data from Excel spreadsheets in R with the <strong>readxl</strong> package. This package is non-core tidyverse, so you need to load it explicitly but it is installed automatically when you install the tidyverse package.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">library(readxl)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">library(readxl)
 | 
			
		||||
library(tidyverse)</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p><strong>xlsx</strong> and <strong>XLConnect</strong> can be used for reading data from and writing data to Excel spreadsheets. However, these two packages require Java installed on your machine and the rJava package. Due to potential challenges with installation, we recommend using alternative packages we’ve introduced in this chapter.</p>
 | 
			
		||||
@@ -49,11 +49,11 @@ Reading spreadsheets</h2>
 | 
			
		||||
</div>
 | 
			
		||||
<p>The first argument to <code><a href="https://readxl.tidyverse.org/reference/read_excel.html">read_excel()</a></code> is the path to the file to read.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">students <- read_excel("data/students.xlsx")</pre>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">students <- read_excel("data/students.xlsx")</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p><code><a href="https://readxl.tidyverse.org/reference/read_excel.html">read_excel()</a></code> will read the file in as a tibble.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">students
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">students
 | 
			
		||||
#> # A tibble: 6 × 5
 | 
			
		||||
#>   `Student ID` `Full Name`      favourite.food     mealPlan            AGE  
 | 
			
		||||
#>          <dbl> <chr>            <chr>              <chr>               <chr>
 | 
			
		||||
@@ -68,7 +68,7 @@ Reading spreadsheets</h2>
 | 
			
		||||
<ol type="1"><li>
 | 
			
		||||
<p>The column names are all over the place. You can provide column names that follow a consistent format; we recommend <code>snake_case</code> using the <code>col_names</code> argument.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(
 | 
			
		||||
  "data/students.xlsx",
 | 
			
		||||
  col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age")
 | 
			
		||||
)
 | 
			
		||||
@@ -85,7 +85,7 @@ Reading spreadsheets</h2>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Unfortunately, this didn’t quite do the trick. You now have the variable names we want, but what was previously the header row now shows up as the first observation in the data. You can explicitly skip that row using the <code>skip</code> argument.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(
 | 
			
		||||
  "data/students.xlsx",
 | 
			
		||||
  col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
 | 
			
		||||
  skip = 1
 | 
			
		||||
@@ -104,7 +104,7 @@ Reading spreadsheets</h2>
 | 
			
		||||
<li>
 | 
			
		||||
<p>In the <code>favourite_food</code> column, one of the observations is <code>N/A</code>, which stands for “not available” but it’s currently not recognized as an <code>NA</code> (note the contrast between this <code>N/A</code> and the age of the fourth student in the list). You can specify which character strings should be recognized as <code>NA</code>s with the <code>na</code> argument. By default, only <code>""</code> (empty string, or, in the case of reading from a spreadsheet, an empty cell) is recognized as an <code>NA</code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(
 | 
			
		||||
  "data/students.xlsx",
 | 
			
		||||
  col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
 | 
			
		||||
  skip = 1,
 | 
			
		||||
@@ -124,7 +124,7 @@ Reading spreadsheets</h2>
 | 
			
		||||
<li>
 | 
			
		||||
<p>One other remaining issue is that <code>age</code> is read in as a character variable, but it really should be numeric. Just like with <code><a href="https://readr.tidyverse.org/reference/read_delim.html">read_csv()</a></code> and friends for reading data from flat files, you can supply a <code>col_types</code> argument to <code><a href="https://readxl.tidyverse.org/reference/read_excel.html">read_excel()</a></code> and specify the column types for the variables you read in. The syntax is a bit different, though. Your options are <code>"skip"</code>, <code>"guess"</code>, <code>"logical"</code>, <code>"numeric"</code>, <code>"date"</code>, <code>"text"</code> or <code>"list"</code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(
 | 
			
		||||
  "data/students.xlsx",
 | 
			
		||||
  col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
 | 
			
		||||
  skip = 1,
 | 
			
		||||
@@ -144,7 +144,7 @@ Reading spreadsheets</h2>
 | 
			
		||||
</div>
 | 
			
		||||
<p>However, this didn’t quite produce the desired result either. By specifying that <code>age</code> should be numeric, we have turned the one cell with the non-numeric entry (which had the value <code>five</code>) into an <code>NA</code>. In this case, we should read age in as <code>"text"</code> and then make the change once the data is loaded in R.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">students <- read_excel(
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">students <- read_excel(
 | 
			
		||||
  "data/students.xlsx",
 | 
			
		||||
  col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
 | 
			
		||||
  skip = 1,
 | 
			
		||||
@@ -187,7 +187,7 @@ Reading individual sheets</h2>
 | 
			
		||||
</div>
 | 
			
		||||
<p>You can read a single sheet from a spreadsheet with the <code>sheet</code> argument in <code><a href="https://readxl.tidyverse.org/reference/read_excel.html">read_excel()</a></code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel("data/penguins.xlsx", sheet = "Torgersen Island")
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel("data/penguins.xlsx", sheet = "Torgersen Island")
 | 
			
		||||
#> # A tibble: 52 × 8
 | 
			
		||||
#>   species island    bill_length_mm     bill_dep…¹ flipp…² body_…³ sex    year
 | 
			
		||||
#>   <chr>   <chr>     <chr>              <chr>      <chr>   <chr>   <chr> <dbl>
 | 
			
		||||
@@ -202,7 +202,7 @@ Reading individual sheets</h2>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Some variables that appear to contain numerical data are read in as characters due to the character string <code>"NA"</code> not being recognized as a true <code>NA</code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">penguins_torgersen <- read_excel("data/penguins.xlsx", sheet = "Torgersen Island", na = "NA")
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">penguins_torgersen <- read_excel("data/penguins.xlsx", sheet = "Torgersen Island", na = "NA")
 | 
			
		||||
 | 
			
		||||
penguins_torgersen
 | 
			
		||||
#> # A tibble: 52 × 8
 | 
			
		||||
@@ -219,17 +219,17 @@ penguins_torgersen
 | 
			
		||||
</div>
 | 
			
		||||
<p>However, we cheated here a bit. We looked inside the Excel spreadsheet, which is not a recommended workflow. Instead, you can use <code><a href="https://readxl.tidyverse.org/reference/excel_sheets.html">excel_sheets()</a></code> to get information on all sheets in an Excel spreadsheet, and then read the one(s) you’re interested in.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">excel_sheets("data/penguins.xlsx")
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">excel_sheets("data/penguins.xlsx")
 | 
			
		||||
#> [1] "Torgersen Island" "Biscoe Island"    "Dream Island"</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Once you know the names of the sheets, you can read them in individually with <code><a href="https://readxl.tidyverse.org/reference/read_excel.html">read_excel()</a></code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">penguins_biscoe <- read_excel("data/penguins.xlsx", sheet = "Biscoe Island", na = "NA")
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">penguins_biscoe <- read_excel("data/penguins.xlsx", sheet = "Biscoe Island", na = "NA")
 | 
			
		||||
penguins_dream  <- read_excel("data/penguins.xlsx", sheet = "Dream Island", na = "NA")</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>In this case the full penguins dataset is spread across three sheets in the spreadsheet. Each sheet has the same number of columns but different numbers of rows.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">dim(penguins_torgersen)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">dim(penguins_torgersen)
 | 
			
		||||
#> [1] 52  8
 | 
			
		||||
dim(penguins_biscoe)
 | 
			
		||||
#> [1] 168   8
 | 
			
		||||
@@ -238,7 +238,7 @@ dim(penguins_dream)
 | 
			
		||||
</div>
 | 
			
		||||
<p>We can put them together with <code><a href="https://dplyr.tidyverse.org/reference/bind_rows.html">bind_rows()</a></code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">penguins <- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">penguins <- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)
 | 
			
		||||
penguins
 | 
			
		||||
#> # A tibble: 344 × 8
 | 
			
		||||
#>   species island    bill_length_mm bill_depth_mm flippe…¹ body_…² sex    year
 | 
			
		||||
@@ -269,7 +269,7 @@ Reading part of a sheet</h2>
 | 
			
		||||
</div>
 | 
			
		||||
<p>This spreadsheet is one of the example spreadsheets provided in the readxl package. You can use the <code><a href="https://readxl.tidyverse.org/reference/readxl_example.html">readxl_example()</a></code> function to locate the spreadsheet on your system in the directory where the package is installed. This function returns the path to the spreadsheet, which you can use in <code><a href="https://readxl.tidyverse.org/reference/read_excel.html">read_excel()</a></code> as usual.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">deaths_path <- readxl_example("deaths.xlsx")
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">deaths_path <- readxl_example("deaths.xlsx")
 | 
			
		||||
deaths <- read_excel(deaths_path)
 | 
			
		||||
#> New names:
 | 
			
		||||
#> • `` -> `...2`
 | 
			
		||||
@@ -292,7 +292,7 @@ deaths
 | 
			
		||||
<p>The top three rows and the bottom four rows are not part of the data frame.</p>
 | 
			
		||||
<p>We could skip the top three rows with <code>skip</code>. Note that we set <code>skip = 4</code> since the fourth row contains column names, not the data.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(deaths_path, skip = 4)
 | 
			
		||||
#> # A tibble: 14 × 6
 | 
			
		||||
#>   Name          Profession Age   `Has kids` `Date of birth`     Date of dea…¹
 | 
			
		||||
#>   <chr>         <chr>      <chr> <chr>      <dttm>              <chr>        
 | 
			
		||||
@@ -306,7 +306,7 @@ deaths
 | 
			
		||||
</div>
 | 
			
		||||
<p>We could also set <code>n_max</code> to omit the extraneous rows at the bottom.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, skip = 4, n_max = 10)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(deaths_path, skip = 4, n_max = 10)
 | 
			
		||||
#> # A tibble: 10 × 6
 | 
			
		||||
#>   Name          Profe…¹   Age Has k…² `Date of birth`     `Date of death`    
 | 
			
		||||
#>   <chr>         <chr>   <dbl> <lgl>   <dttm>              <dttm>             
 | 
			
		||||
@@ -324,19 +324,19 @@ deaths
 | 
			
		||||
<ul><li>
 | 
			
		||||
<p>Supply this information to the <code>range</code> argument:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, range = "A5:F15")</pre>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(deaths_path, range = "A5:F15")</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</li>
 | 
			
		||||
<li>
 | 
			
		||||
<p>Specify rows:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, range = cell_rows(c(5, 15)))</pre>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(deaths_path, range = cell_rows(c(5, 15)))</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</li>
 | 
			
		||||
<li>
 | 
			
		||||
<p>Specify cells that mark the top-left and bottom-right corners of the data – the top-left corner, <code>A5</code>, translates to <code>c(5, 1)</code> (5th row down, 1st column) and the bottom-right corner, <code>F15</code>, translates to <code>c(15, 6)</code>:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel(deaths_path, range = cell_limits(c(5, 1), c(15, 6)))</pre>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel(deaths_path, range = cell_limits(c(5, 1), c(15, 6)))</pre>
 | 
			
		||||
</div>
 | 
			
		||||
</li>
 | 
			
		||||
</ul><p>If you have control over the sheet, an even better way is to create a “named range”. This is useful within Excel because named ranges help repeat formulas easier to create and they have some useful properties for creating dynamic charts and graphs as well. Even if you’re not working in Excel, named ranges can be useful for identifying which cells to read into R. In the example above, the table we’re reading in is named <code>Table1</code>, so we can read it in with the following.</p>
 | 
			
		||||
@@ -369,7 +369,7 @@ Data not in cell values</h2>
 | 
			
		||||
Writing to Excel</h2>
 | 
			
		||||
<p>Let’s create a small data frame that we can then write out. Note that <code>item</code> is a factor and <code>quantity</code> is an integer.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">bake_sale <- tibble(
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">bake_sale <- tibble(
 | 
			
		||||
  item     = factor(c("brownie", "cupcake", "cookie")),
 | 
			
		||||
  quantity = c(10, 5, 8)
 | 
			
		||||
)
 | 
			
		||||
@@ -384,7 +384,7 @@ bake_sale
 | 
			
		||||
</div>
 | 
			
		||||
<p>You can write data back to disk as an Excel file using the <code><a href="https://docs.ropensci.org/writexl/reference/write_xlsx.html">write_xlsx()</a></code> from the <strong>writexl</strong> package.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">library(writexl)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">library(writexl)
 | 
			
		||||
write_xlsx(bake_sale, path = "data/bake-sale.xlsx")</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p><a href="#fig-bake-sale-excel" data-type="xref">#fig-bake-sale-excel</a> shows what the data looks like in Excel. Note that column names are included and bolded. These can be turned off by setting <code>col_names</code> and <code>format_headers</code> arguments to <code>FALSE</code>.</p>
 | 
			
		||||
@@ -398,7 +398,7 @@ write_xlsx(bake_sale, path = "data/bake-sale.xlsx")</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Just like reading from a CSV, information on data type is lost when we read the data back in. This makes Excel files unreliable for caching interim results as well. For alternatives, see <a href="#sec-writing-to-a-file" data-type="xref">#sec-writing-to-a-file</a>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">read_excel("data/bake-sale.xlsx")
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">read_excel("data/bake-sale.xlsx")
 | 
			
		||||
#> # A tibble: 3 × 2
 | 
			
		||||
#>   item    quantity
 | 
			
		||||
#>   <chr>      <dbl>
 | 
			
		||||
@@ -414,7 +414,7 @@ Formatted output</h2>
 | 
			
		||||
<p>The readxl package is a light-weight solution for writing a simple Excel spreadsheet, but if you’re interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the <strong>openxlsx</strong> package. Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar. For example, function names are camelCase, multiple functions can’t be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse. However, this is ok. As your R learning and usage expands outside of this book you will encounter lots of different styles used in various R packages that you might need to use to accomplish specific goals in R. A good way of familiarizing yourself with the coding style used in a new package is to run the examples provided in function documentation to get a feel for the syntax and the output formats as well as reading any vignettes that might come with the package.</p>
 | 
			
		||||
<p>Below we show how to write a spreadsheet with three sheets, one for each species of penguins in the <code>penguins</code> data frame.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">library(openxlsx)
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">library(openxlsx)
 | 
			
		||||
library(palmerpenguins)
 | 
			
		||||
 | 
			
		||||
# Create a workbook (spreadsheet)
 | 
			
		||||
@@ -444,7 +444,7 @@ writeDataTable(
 | 
			
		||||
</div>
 | 
			
		||||
<p>This creates a workbook object:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">penguins_species
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">penguins_species
 | 
			
		||||
#> A Workbook object.
 | 
			
		||||
#>  
 | 
			
		||||
#> Worksheets:
 | 
			
		||||
@@ -464,7 +464,7 @@ writeDataTable(
 | 
			
		||||
</div>
 | 
			
		||||
<p>And we can write this to this with <code><a href="https://rdrr.io/pkg/openxlsx/man/saveWorkbook.html">saveWorkbook()</a></code>.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="downlit">saveWorkbook(penguins_species, "data/penguins-species.xlsx")</pre>
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">saveWorkbook(penguins_species, "data/penguins-species.xlsx")</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>The resulting spreadsheet is shown in <a href="#fig-penguins-species" data-type="xref">#fig-penguins-species</a>. By default, openxlsx formats the data as an Excel table.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user