A round of edits for length (#1278)

* Move workflow section into Quarto chapter

* Delete removed chapter

* - Hide figures in exercises
- Change eval: false to fig-show: hide

* Remove

* Fix color scale, closes #1243

* Fix plot size

* Reduce image size + formatted excel discussion

* Shorten URLs where possible

* Fix text running off the page on PDF

* Address figure sizing issue

* Re-org removes the need for this reference

* Remove one plot to reduce redundancy

* Read thru

* Minor edits

* Add ggthemes
This commit is contained in:
Mine Cetinkaya-Rundel
2023-02-13 02:22:17 -05:00
committed by GitHub
parent be10801648
commit 18419626ed
12 changed files with 151 additions and 265 deletions

View File

@@ -55,6 +55,7 @@ For the rest of the chapter we will focus on using `read_excel()`.
```{r}
#| label: fig-students-excel
#| echo: false
#| fig-width: 5
#| fig-cap: >
#| Spreadsheet called students.xlsx in Excel.
#| fig-alt: >
@@ -386,6 +387,7 @@ These can be turned off by setting `col_names` and `format_headers` arguments to
```{r}
#| label: fig-bake-sale-excel
#| echo: false
#| fig-width: 5
#| fig-cap: >
#| Spreadsheet called bake_sale.xlsx in Excel.
#| fig-alt: >
@@ -405,84 +407,22 @@ read_excel("data/bake-sale.xlsx")
### Formatted output
The writexl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
We won't go into the details of using this package here, but we recommend reading <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive discussion on further formatting functionality for data written from R to Excel with openxlsx.
Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar.
For example, function names are camelCase, multiple functions can't be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse.
However, this is ok.
As your R learning and usage expands outside of this book you will encounter lots of different styles used in various R packages that you might need to use to accomplish specific goals in R.
As your R learning and usage expands outside of this book you will encounter lots of different styles used in various R packages that you might use to accomplish specific goals in R.
A good way of familiarizing yourself with the coding style used in a new package is to run the examples provided in function documentation to get a feel for the syntax and the output formats as well as reading any vignettes that might come with the package.
Below we show how to write a spreadsheet with three sheets, one for each species of penguins in the `penguins` data frame.
```{r}
#| message: false
library(openxlsx)
library(palmerpenguins)
# Create a workbook (spreadsheet)
penguins_species <- createWorkbook()
# Add three sheets to the spreadsheet
addWorksheet(penguins_species, sheetName = "Adelie")
addWorksheet(penguins_species, sheetName = "Gentoo")
addWorksheet(penguins_species, sheetName = "Chinstrap")
# Write data to each sheet
writeDataTable(
penguins_species,
sheet = "Adelie",
x = penguins |> filter(species == "Adelie")
)
writeDataTable(
penguins_species,
sheet = "Gentoo",
x = penguins |> filter(species == "Gentoo")
)
writeDataTable(
penguins_species,
sheet = "Chinstrap",
x = penguins |> filter(species == "Chinstrap")
)
```
This creates a workbook object:
```{r}
penguins_species
```
And we can write this to this with `saveWorkbook()`.
```{r}
#| eval: false
saveWorkbook(penguins_species, "data/penguins-species.xlsx")
```
The resulting spreadsheet is shown in @fig-penguins-species.
By default, openxlsx formats the data as an Excel table.
```{r}
#| label: fig-penguins-species
#| echo: false
#| fig-cap: >
#| Spreadsheet called penguins.xlsx in Excel.
#| fig-alt: >
#| A look at the penguins spreadsheet in Excel. The spreadsheet contains has
#| three sheets: Torgersen Island, Biscoe Island, and Dream Island.
knitr::include_graphics("screenshots/import-spreadsheets-penguins-species.png")
```
See <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive discussion on further formatting functionality for data written from R to Excel with openxlsx.
### Exercises
1. In an Excel file, create the following dataset and save it as `survey.xlsx`.
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1yc5gL-a2OOBr8M7B3IsDNX5uR17vBHOyWZq6xSTG2G8/edit?usp=sharing).
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1yc5gL-a2OOBr8M7B3IsDNX5uR17vBHOyWZq6xSTG2G8).
```{r}
#| echo: false
#| fig-width: 4
#| fig-alt: >
#| A spreadsheet with 3 columns (group, subgroup, and id) and 12 rows.
#| The group column has two values: 1 (spanning 7 merged rows) and 2
@@ -512,10 +452,11 @@ See <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive
```
2. In another Excel file, create the following dataset and save it as `roster.xlsx`.
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE/edit?usp=sharing).
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE).
```{r}
#| echo: false
#| fig-width: 4
#| fig-alt: >
#| A spreadsheet with 3 columns (group, subgroup, and id) and 12 rows. The
#| group column has two values: 1 (spanning 7 merged rows) and 2 (spanning
@@ -540,7 +481,7 @@ See <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive
```
3. In a new Excel file, create the following dataset and save it as `sales.xlsx`.
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1oCqdXUNO8JR3Pca8fHfiz_WXWxMuZAp3YiYFaKze5V0/edit?usp=sharing).
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1oCqdXUNO8JR3Pca8fHfiz_WXWxMuZAp3YiYFaKze5V0).
```{r}
#| echo: false
@@ -647,7 +588,8 @@ gs4_deauth()
```
```{r}
students <- read_sheet("https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/edit?usp=sharing")
students_url <- "https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w"
students <- read_sheet(students_url)
```
`read_sheet()` will read the file in as a tibble.
@@ -660,7 +602,7 @@ Just like we did with `read_excel()`, we can supply column names, NA strings, an
```{r}
students <- read_sheet(
"https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/edit?usp=sharing",
students_url,
col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
skip = 1,
na = c("", "N/A"),
@@ -681,13 +623,14 @@ It's also possible to read individual sheets from Google Sheets as well.
Let's read the penguins Google Sheet at <https://pos.it/r4ds-penguins>, and specifically the "Torgersen Island" sheet in it.
```{r}
read_sheet("https://docs.google.com/spreadsheets/d/1aFu8lnD_g0yjF5O-K6SFgSEWiHPpgvFCF0NY9D6LXnY/edit?usp=sharing", sheet = "Torgersen Island")
penguins_url <- "https://docs.google.com/spreadsheets/d/1aFu8lnD_g0yjF5O-K6SFgSEWiHPpgvFCF0NY9D6LXnY"
read_sheet(penguins_url, sheet = "Torgersen Island")
```
You can obtain a list of all sheets within a Google Sheet with `sheet_names()`:
```{r}
sheet_names("https://docs.google.com/spreadsheets/d/1aFu8lnD_g0yjF5O-K6SFgSEWiHPpgvFCF0NY9D6LXnY/edit?usp=sharing")
sheet_names(penguins_url)
```
Finally, just like with `read_excel()`, we can read in a portion of a Google Sheet by defining a `range` in `read_sheet()`.
@@ -740,7 +683,7 @@ For further authentication details, we recommend reading the documentation googl
#| echo: false
#| message: false
read_sheet("https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE/edit#gid=0") |>
read_sheet("https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE/") |>
fill(group, subgroup) |>
print(n = 12)
```