A round of edits for length (#1278)
* Move workflow section into Quarto chapter * Delete removed chapter * - Hide figures in exercises - Change eval: false to fig-show: hide * Remove * Fix color scale, closes #1243 * Fix plot size * Reduce image size + formatted excel discussion * Shorten URLs where possible * Fix text running off the page on PDF * Address figure sizing issue * Re-org removes the need for this reference * Remove one plot to reduce redundancy * Read thru * Minor edits * Add ggthemes
This commit is contained in:
parent
be10801648
commit
18419626ed
|
@ -18,6 +18,7 @@ Imports:
|
||||||
gapminder,
|
gapminder,
|
||||||
ggrepel,
|
ggrepel,
|
||||||
ggridges,
|
ggridges,
|
||||||
|
ggthemes,
|
||||||
hexbin,
|
hexbin,
|
||||||
janitor,
|
janitor,
|
||||||
Lahman,
|
Lahman,
|
||||||
|
|
|
@ -71,7 +71,6 @@ book:
|
||||||
chapters:
|
chapters:
|
||||||
- quarto.qmd
|
- quarto.qmd
|
||||||
- quarto-formats.qmd
|
- quarto-formats.qmd
|
||||||
- quarto-workflow.qmd
|
|
||||||
|
|
||||||
format:
|
format:
|
||||||
html:
|
html:
|
||||||
|
|
35
base-R.qmd
35
base-R.qmd
|
@ -3,24 +3,25 @@
|
||||||
```{r}
|
```{r}
|
||||||
#| results: "asis"
|
#| results: "asis"
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
|
||||||
source("_common.R")
|
source("_common.R")
|
||||||
status("complete")
|
status("complete")
|
||||||
```
|
```
|
||||||
|
|
||||||
To finish off the programming section, we're going to give you a quick tour of the most important base R functions that we don't otherwise discuss in the book.
|
To finish off the programming section, we're going to give you a quick tour of the most important base R functions that we don't otherwise discuss in the book.
|
||||||
These tools are particularly useful as you do more programming and will help you read code that you'll encounter in the wild.
|
These tools are particularly useful as you do more programming and will help you read code you'll encounter in the wild.
|
||||||
|
|
||||||
This is a good place to remind you that the tidyverse is not the only way to solve data science problems.
|
This is a good place to remind you that the tidyverse is not the only way to solve data science problems.
|
||||||
We teach the tidyverse in this book because tidyverse packages share a common design philosophy, which increases the consistency across functions, making each new function or package a little easier to learn and use.
|
We teach the tidyverse in this book because tidyverse packages share a common design philosophy, increasing the consistency across functions, and making each new function or package a little easier to learn and use.
|
||||||
It's not possible to use the tidyverse without using base R, so we've actually already taught you a **lot** of base R functions: from `library()` to load packages, to `sum()` and `mean()` for numeric summaries, to the factor, date, and POSIXct data types, and of course all the basic operators like `+`, `-`, `/`, `*`, `|`, `&`, and `!`.
|
It's not possible to use the tidyverse without using base R, so we've actually already taught you a **lot** of base R functions: from `library()` to load packages, to `sum()` and `mean()` for numeric summaries, to the factor, date, and POSIXct data types, and of course all the basic operators like `+`, `-`, `/`, `*`, `|`, `&`, and `!`.
|
||||||
What we haven't focused on so far is base R workflows, so we will highlight a few of those in this chapter.
|
What we haven't focused on so far is base R workflows, so we will highlight a few of those in this chapter.
|
||||||
|
|
||||||
After you read this book you'll learn other approaches to the same problems using base R, data.table, and other packages.
|
After you read this book, you'll learn other approaches to the same problems using base R, data.table, and other packages.
|
||||||
You'll certainly encounter these other approaches when you start reading R code written by other people, particularly if you're using StackOverflow.
|
You'll undoubtedly encounter these other approaches when you start reading R code written by others, particularly if you're using StackOverflow.
|
||||||
It's 100% okay to write code that uses a mix of approaches, and don't let anyone tell you otherwise!
|
It's 100% okay to write code that uses a mix of approaches, and don't let anyone tell you otherwise!
|
||||||
|
|
||||||
In this chapter, we'll focus on four big topics: subsetting with `[`, subsetting with `[[` and `$`, the apply family of functions, and `for` loops.
|
In this chapter, we'll focus on four big topics: subsetting with `[`, subsetting with `[[` and `$`, the apply family of functions, and `for` loops.
|
||||||
To finish off, we'll briefly discuss two important plotting functions.
|
To finish off, we'll briefly discuss two essential plotting functions.
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
|
@ -39,7 +40,7 @@ We'll then help you cement that knowledge by showing how various dplyr verbs are
|
||||||
|
|
||||||
### Subsetting vectors
|
### Subsetting vectors
|
||||||
|
|
||||||
There are five main types of things that you can subset a vector with, i.e. that can be the `i` in `x[i]`:
|
There are five main types of things that you can subset a vector with, i.e., that can be the `i` in `x[i]`:
|
||||||
|
|
||||||
1. **A vector of positive integers**.
|
1. **A vector of positive integers**.
|
||||||
Subsetting with positive integers keeps the elements at those positions:
|
Subsetting with positive integers keeps the elements at those positions:
|
||||||
|
@ -76,7 +77,7 @@ There are five main types of things that you can subset a vector with, i.e. that
|
||||||
x[x %% 2 == 0]
|
x[x %% 2 == 0]
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that, unlike `filter()`, `NA` indices will be included in the output as `NA`s.
|
Unlike `filter()`, `NA` indices will be included in the output as `NA`s.
|
||||||
|
|
||||||
4. **A character vector**.
|
4. **A character vector**.
|
||||||
If you have a named vector, you can subset it with a character vector:
|
If you have a named vector, you can subset it with a character vector:
|
||||||
|
@ -90,7 +91,7 @@ There are five main types of things that you can subset a vector with, i.e. that
|
||||||
|
|
||||||
5. **Nothing**.
|
5. **Nothing**.
|
||||||
The final type of subsetting is nothing, `x[]`, which returns the complete `x`.
|
The final type of subsetting is nothing, `x[]`, which returns the complete `x`.
|
||||||
This is not useful for subsetting vectors, but as we'll see shortly it is useful when subsetting 2d structures like tibbles.
|
This is not useful for subsetting vectors, but as we'll see shortly, it is useful when subsetting 2d structures like tibbles.
|
||||||
|
|
||||||
### Subsetting data frames
|
### Subsetting data frames
|
||||||
|
|
||||||
|
@ -122,7 +123,7 @@ We'll come back to `$` shortly, but you should be able to guess what `df$x` does
|
||||||
We need to use it here because `[` doesn't use tidy evaluation, so you need to be explicit about the source of the `x` variable.
|
We need to use it here because `[` doesn't use tidy evaluation, so you need to be explicit about the source of the `x` variable.
|
||||||
|
|
||||||
There's an important difference between tibbles and data frames when it comes to `[`.
|
There's an important difference between tibbles and data frames when it comes to `[`.
|
||||||
In this book we've mostly used tibbles, which *are* data frames, but they tweak some older behaviors to make your life a little easier.
|
In this book, we've mainly used tibbles, which *are* data frames, but they tweak some behaviors to make your life a little easier.
|
||||||
In most places, you can use "tibble" and "data frame" interchangeably, so when we want to draw particular attention to R's built-in data frame, we'll write `data.frame`.
|
In most places, you can use "tibble" and "data frame" interchangeably, so when we want to draw particular attention to R's built-in data frame, we'll write `data.frame`.
|
||||||
If `df` is a `data.frame`, then `df[, cols]` will return a vector if `col` selects a single column and a data frame if it selects more than one column.
|
If `df` is a `data.frame`, then `df[, cols]` will return a vector if `col` selects a single column and a data frame if it selects more than one column.
|
||||||
If `df` is a tibble, then `[` will always return a tibble.
|
If `df` is a tibble, then `[` will always return a tibble.
|
||||||
|
@ -143,12 +144,13 @@ df1[, "x" , drop = FALSE]
|
||||||
|
|
||||||
### dplyr equivalents
|
### dplyr equivalents
|
||||||
|
|
||||||
A number of dplyr verbs are special cases of `[`:
|
Several dplyr verbs are special cases of `[`:
|
||||||
|
|
||||||
- `filter()` is equivalent to subsetting the rows with a logical vector, taking care to exclude missing values:
|
- `filter()` is equivalent to subsetting the rows with a logical vector, taking care to exclude missing values:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| results: false
|
#| results: false
|
||||||
|
|
||||||
df <- tibble(
|
df <- tibble(
|
||||||
x = c(2, 3, 1, 1, NA),
|
x = c(2, 3, 1, 1, NA),
|
||||||
y = letters[1:5],
|
y = letters[1:5],
|
||||||
|
@ -166,18 +168,20 @@ A number of dplyr verbs are special cases of `[`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| results: false
|
#| results: false
|
||||||
|
|
||||||
df |> arrange(x, y)
|
df |> arrange(x, y)
|
||||||
|
|
||||||
# same as
|
# same as
|
||||||
df[order(df$x, df$y), ]
|
df[order(df$x, df$y), ]
|
||||||
```
|
```
|
||||||
|
|
||||||
You can use `order(decreasing = TRUE)` to sort all columns in descending order or `-rank(col)` to individually sort columns in decreasing order.
|
You can use `order(decreasing = TRUE)` to sort all columns in descending order or `-rank(col)` to sort columns in decreasing order individually.
|
||||||
|
|
||||||
- Both `select()` and `relocate()` are similar to subsetting the columns with a character vector:
|
- Both `select()` and `relocate()` are similar to subsetting the columns with a character vector:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| results: false
|
#| results: false
|
||||||
|
|
||||||
df |> select(x, z)
|
df |> select(x, z)
|
||||||
|
|
||||||
# same as
|
# same as
|
||||||
|
@ -196,6 +200,7 @@ df |>
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| results: false
|
#| results: false
|
||||||
|
|
||||||
# same as
|
# same as
|
||||||
df |> subset(x > 1, c(y, z))
|
df |> subset(x > 1, c(y, z))
|
||||||
```
|
```
|
||||||
|
@ -206,7 +211,7 @@ This function was the inspiration for much of dplyr's syntax.
|
||||||
|
|
||||||
1. Create functions that take a vector as input and return:
|
1. Create functions that take a vector as input and return:
|
||||||
|
|
||||||
a. The elements at even numbered positions.
|
a. The elements at even-numbered positions.
|
||||||
b. Every element except the last value.
|
b. Every element except the last value.
|
||||||
c. Only even values (and no missing values).
|
c. Only even values (and no missing values).
|
||||||
|
|
||||||
|
@ -244,7 +249,7 @@ tb$z <- tb$x + tb$y
|
||||||
tb
|
tb
|
||||||
```
|
```
|
||||||
|
|
||||||
There are a number of other base R approaches to creating new columns including with `transform()`, `with()`, and `within()`.
|
There are several other base R approaches to creating new columns including with `transform()`, `with()`, and `within()`.
|
||||||
Hadley collected a few examples at <https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf>.
|
Hadley collected a few examples at <https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf>.
|
||||||
|
|
||||||
Using `$` directly is convenient when performing quick summaries.
|
Using `$` directly is convenient when performing quick summaries.
|
||||||
|
@ -428,6 +433,7 @@ The basic structure of a `for` loop looks like this:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
|
|
||||||
for (element in vector) {
|
for (element in vector) {
|
||||||
# do something with element
|
# do something with element
|
||||||
}
|
}
|
||||||
|
@ -438,6 +444,7 @@ For example, in @sec-save-database instead of using walk:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
|
|
||||||
paths |> walk(append_file)
|
paths |> walk(append_file)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -445,6 +452,7 @@ We could have used a `for` loop:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
|
|
||||||
for (path in paths) {
|
for (path in paths) {
|
||||||
append_file(path)
|
append_file(path)
|
||||||
}
|
}
|
||||||
|
@ -506,6 +514,7 @@ Here's a quick example from the diamonds dataset:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| dev: png
|
#| dev: png
|
||||||
|
|
||||||
hist(diamonds$carat)
|
hist(diamonds$carat)
|
||||||
|
|
||||||
plot(diamonds$carat, diamonds$price)
|
plot(diamonds$carat, diamonds$price)
|
||||||
|
|
|
@ -217,41 +217,6 @@ ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
|
||||||
```
|
```
|
||||||
|
|
||||||
Note the use of `hjust` and `vjust` to control the alignment of the label.
|
Note the use of `hjust` and `vjust` to control the alignment of the label.
|
||||||
@fig-just shows all nine possible combinations.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| label: fig-just
|
|
||||||
#| echo: false
|
|
||||||
#| fig-width: 4.5
|
|
||||||
#| fig-asp: 0.5
|
|
||||||
#| out-width: "60%"
|
|
||||||
#| fig-cap: >
|
|
||||||
#| All nine combinations of `hjust` and `vjust`.
|
|
||||||
#| fig-alt: >
|
|
||||||
#| A 1x1 grid. At (0,0) hjust is set to left and vjust is set to bottom.
|
|
||||||
#| At (0.5, 0) hjust is center and vjust is bottom and at (1, 0) hjust is
|
|
||||||
#| right and vjust is bottom. At (0, 0.5) hjust is left and vjust is
|
|
||||||
#| center, at (0.5, 0.5) hjust is center and vjust is center, and at (1, 0.5)
|
|
||||||
#| hjust is right and vjust is center. Finally, at (1, 0) hjust is left and
|
|
||||||
#| vjust is top, at (0.5, 1) hjust is center and vjust is top, and at (1, 1)
|
|
||||||
#| hjust is right and vjust is bottom.
|
|
||||||
|
|
||||||
vjust <- c(bottom = 0, center = 0.5, top = 1)
|
|
||||||
hjust <- c(left = 0, center = 0.5, right = 1)
|
|
||||||
|
|
||||||
df <- crossing(hj = names(hjust), vj = names(vjust)) |>
|
|
||||||
mutate(
|
|
||||||
y = vjust[vj],
|
|
||||||
x = hjust[hj],
|
|
||||||
label = paste0("hjust = '", hj, "'\n", "vjust = '", vj, "'")
|
|
||||||
)
|
|
||||||
|
|
||||||
ggplot(df, aes(x, y)) +
|
|
||||||
geom_point(color = "grey70", size = 5) +
|
|
||||||
geom_point(size = 0.5, color = "red") +
|
|
||||||
geom_text(aes(label = label, hjust = hj, vjust = vj), size = 4) +
|
|
||||||
labs(x = NULL, y = NULL)
|
|
||||||
```
|
|
||||||
|
|
||||||
However the annotated plot we made above is hard to read because the labels overlap with each other, and with the points.
|
However the annotated plot we made above is hard to read because the labels overlap with each other, and with the points.
|
||||||
We can make things a little better by switching to `geom_label()` which draws a rectangle behind the text.
|
We can make things a little better by switching to `geom_label()` which draws a rectangle behind the text.
|
||||||
|
@ -342,28 +307,9 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
If you want to place the text exactly on the borders of the plot, you can use `+Inf` and `-Inf`.
|
If you want to place the text exactly on the borders of the plot, you can use set `displ = Inf` and `hwy = Inf` in the tibble above, instead of the calculated maximum values.
|
||||||
Since we're no longer computing the positions from `mpg`, we can use `tibble()` to create the data frame:
|
|
||||||
|
|
||||||
```{r}
|
We can alternatively add the annotation without creating a new data frame, using `annotate()`.
|
||||||
#| fig-alt: >
|
|
||||||
#| Scatterplot of highway fuel efficiency versus engine size of cars. On the
|
|
||||||
#| top right corner, flush against the corner, is an annotation that
|
|
||||||
#| reads "increasing engine size is related to decreasing fuel economy".
|
|
||||||
#| The text spans two lines.
|
|
||||||
|
|
||||||
label_info <- tibble(
|
|
||||||
displ = Inf,
|
|
||||||
hwy = Inf,
|
|
||||||
label = "Increasing engine size is \nrelated to decreasing fuel economy."
|
|
||||||
)
|
|
||||||
|
|
||||||
ggplot(mpg, aes(x = displ, y = hwy)) +
|
|
||||||
geom_point() +
|
|
||||||
geom_text(data = label_info, aes(label = label), vjust = "top", hjust = "right")
|
|
||||||
```
|
|
||||||
|
|
||||||
Alternatively, we can add the annotation without creating a new data frame, using `annotate()`.
|
|
||||||
This function adds a geom to a plot, but it doesn't map variables of a data frame to an aesthetic.
|
This function adds a geom to a plot, but it doesn't map variables of a data frame to an aesthetic.
|
||||||
The first argument of this function, `geom`, is the geometric object you want to use for annotation.
|
The first argument of this function, `geom`, is the geometric object you want to use for annotation.
|
||||||
|
|
||||||
|
@ -608,7 +554,7 @@ The theme setting `legend.position` controls where the legend is drawn:
|
||||||
```{r}
|
```{r}
|
||||||
#| layout-ncol: 2
|
#| layout-ncol: 2
|
||||||
#| fig-width: 4
|
#| fig-width: 4
|
||||||
#| fig-asp: 1
|
#| fig-height: 2
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| Four scatterplots of highway fuel efficiency versus engine size of cars
|
#| Four scatterplots of highway fuel efficiency versus engine size of cars
|
||||||
#| where points are colored based on class of car. Clockwise, the legend
|
#| where points are colored based on class of car. Clockwise, the legend
|
||||||
|
@ -1059,6 +1005,7 @@ Finally, we have also customized the heights of the various components of our pa
|
||||||
Patchwork divides up the area you have allotted for your plot using this scale and places the components accordingly.
|
Patchwork divides up the area you have allotted for your plot using this scale and places the components accordingly.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
#| fig-width: 10
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| Five plots laid out such that first two plots are next to each other. Plots
|
#| Five plots laid out such that first two plots are next to each other. Plots
|
||||||
#| three and four are underneath them. And the fifth plot stretches under them.
|
#| three and four are underneath them. And the fifth plot stretches under them.
|
||||||
|
|
|
@ -49,10 +49,11 @@ library(tidyverse)
|
||||||
|
|
||||||
You only need to install a package once, but you need to load it every time you start a new session.
|
You only need to install a package once, but you need to load it every time you start a new session.
|
||||||
|
|
||||||
In addition to tidyverse, we will also use the **palmerpenguins** package, which includes the `penguins` dataset containing body measurements for penguins on three islands in the Palmer Archipelago.
|
In addition to tidyverse, we will also use the **palmerpenguins** package, which includes the `penguins` dataset containing body measurements for penguins on three islands in the Palmer Archipelago, and the ggthemes package, which offers a colorblind safe color palette.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
library(palmerpenguins)
|
library(palmerpenguins)
|
||||||
|
library(ggthemes)
|
||||||
```
|
```
|
||||||
|
|
||||||
## First steps
|
## First steps
|
||||||
|
@ -128,7 +129,8 @@ ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
|
||||||
y = "Body mass (g)",
|
y = "Body mass (g)",
|
||||||
color = "Species",
|
color = "Species",
|
||||||
shape = "Species"
|
shape = "Species"
|
||||||
)
|
) +
|
||||||
|
scale_color_colorblind()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Creating a ggplot
|
### Creating a ggplot
|
||||||
|
@ -323,6 +325,7 @@ Note that the legend is automatically updated to reflect the different shapes of
|
||||||
And finally, we can improve the labels of our plot using the `labs()` function in a new layer.
|
And finally, we can improve the labels of our plot using the `labs()` function in a new layer.
|
||||||
Some of the arguments to `labs()` might be self explanatory: `title` adds a title and `subtitle` adds a subtitle to the plot.
|
Some of the arguments to `labs()` might be self explanatory: `title` adds a title and `subtitle` adds a subtitle to the plot.
|
||||||
Other arguments match the aesthetic mappings, `x` is the x-axis label, `y` is the y-axis label, and `color` and `shape` define the label for the legend.
|
Other arguments match the aesthetic mappings, `x` is the x-axis label, `y` is the y-axis label, and `color` and `shape` define the label for the legend.
|
||||||
|
In addition, we can improve the color palette to be colorblind safe with the `scale_color_colorblind()` function from the ggthemes package.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| warning: false
|
#| warning: false
|
||||||
|
@ -345,11 +348,10 @@ ggplot(
|
||||||
labs(
|
labs(
|
||||||
title = "Body mass and flipper length",
|
title = "Body mass and flipper length",
|
||||||
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
|
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
|
||||||
x = "Flipper length (mm)",
|
x = "Flipper length (mm)", y = "Body mass (g)",
|
||||||
y = "Body mass (g)",
|
color = "Species", shape = "Species"
|
||||||
color = "Species",
|
) +
|
||||||
shape = "Species"
|
scale_color_colorblind()
|
||||||
)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
We finally have a plot that perfectly matches our "ultimate goal"!
|
We finally have a plot that perfectly matches our "ultimate goal"!
|
||||||
|
|
Binary file not shown.
73
layers.qmd
73
layers.qmd
|
@ -3,6 +3,7 @@
|
||||||
```{r}
|
```{r}
|
||||||
#| results: "asis"
|
#| results: "asis"
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
|
||||||
source("_common.R")
|
source("_common.R")
|
||||||
status("complete")
|
status("complete")
|
||||||
```
|
```
|
||||||
|
@ -205,6 +206,7 @@ In the next section we dive deeper into geoms.
|
||||||
2. Why did the following code not result in a plot with blue points?
|
2. Why did the following code not result in a plot with blue points?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
#| fig-show: hide
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| Scatterplot of highway fuel efficiency versus engine size of cars
|
#| Scatterplot of highway fuel efficiency versus engine size of cars
|
||||||
#| that shows a negative association. All points are red and
|
#| that shows a negative association. All points are red and
|
||||||
|
@ -254,7 +256,7 @@ To change the geom in your plot, change the geom function that you add to `ggplo
|
||||||
For instance, to make the plots above, you can use this code:
|
For instance, to make the plots above, you can use this code:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| fig-show: hide
|
||||||
|
|
||||||
# Left
|
# Left
|
||||||
ggplot(mpg, aes(x = displ, y = hwy)) +
|
ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
|
@ -441,7 +443,8 @@ To learn more about any single geom, use the help (e.g. `?geom_smooth`).
|
||||||
2. Earlier in this chapter we used `show.legend` without explaining it:
|
2. Earlier in this chapter we used `show.legend` without explaining it:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| fig-show: hide
|
||||||
|
|
||||||
ggplot(mpg, aes(x = displ, y = hwy)) +
|
ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
geom_smooth(aes(color = drv), show.legend = FALSE)
|
geom_smooth(aes(color = drv), show.legend = FALSE)
|
||||||
```
|
```
|
||||||
|
@ -551,13 +554,11 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
1. What happens if you facet on a continuous variable?
|
1. What happens if you facet on a continuous variable?
|
||||||
|
|
||||||
2. What do the empty cells in plot with `facet_grid(drv ~ cyl)` mean?
|
2. What do the empty cells in plot with `facet_grid(drv ~ cyl)` mean?
|
||||||
How do they relate to this plot?
|
Run the following code.
|
||||||
|
How do they relate to the resulting plot?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-show: hide
|
||||||
#| Scatterplot of number of cycles versus type of drive train of cars.
|
|
||||||
#| The plot shows that there are no cars with 5 cylinders that are 4
|
|
||||||
#| wheel drive or with 4 or 5 cylinders that are front wheel drive.
|
|
||||||
|
|
||||||
ggplot(mpg) +
|
ggplot(mpg) +
|
||||||
geom_point(aes(x = drv, y = cyl))
|
geom_point(aes(x = drv, y = cyl))
|
||||||
|
@ -567,7 +568,7 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
What does `.` do?
|
What does `.` do?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| fig-show: hide
|
||||||
|
|
||||||
ggplot(mpg) +
|
ggplot(mpg) +
|
||||||
geom_point(aes(x = displ, y = hwy)) +
|
geom_point(aes(x = displ, y = hwy)) +
|
||||||
|
@ -581,7 +582,7 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
4. Take the first faceted plot in this section:
|
4. Take the first faceted plot in this section:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| fig-show: hide
|
||||||
|
|
||||||
ggplot(mpg) +
|
ggplot(mpg) +
|
||||||
geom_point(aes(x = displ, y = hwy)) +
|
geom_point(aes(x = displ, y = hwy)) +
|
||||||
|
@ -602,10 +603,7 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
What does this say about when to place a faceting variable across rows or columns?
|
What does this say about when to place a faceting variable across rows or columns?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-show: hide
|
||||||
#| Two faceted plots, both visualizing highway fuel efficiency versus
|
|
||||||
#| engine size of cars, faceted by drive train. In the top plot, facet
|
|
||||||
#| are organized across rows and in the second, across columns.
|
|
||||||
|
|
||||||
ggplot(mpg) +
|
ggplot(mpg) +
|
||||||
geom_point(aes(x = displ, y = hwy)) +
|
geom_point(aes(x = displ, y = hwy)) +
|
||||||
|
@ -616,13 +614,11 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
|
||||||
facet_grid(. ~ drv)
|
facet_grid(. ~ drv)
|
||||||
```
|
```
|
||||||
|
|
||||||
7. Recreate this plot using `facet_wrap()` instead of `facet_grid()`.
|
7. Recreate the following plot using `facet_wrap()` instead of `facet_grid()`.
|
||||||
How do the positions of the facet labels change?
|
How do the positions of the facet labels change?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-show: hide
|
||||||
#| Scatterplot of highway fuel efficiency versus engine size of cars,
|
|
||||||
#| faceted by type of drive train across rows.
|
|
||||||
|
|
||||||
ggplot(mpg) +
|
ggplot(mpg) +
|
||||||
geom_point(aes(x = displ, y = hwy)) +
|
geom_point(aes(x = displ, y = hwy)) +
|
||||||
|
@ -770,7 +766,7 @@ Each stat is a function, so you can get help in the usual way, e.g. `?stat_bin`.
|
||||||
In other words, what is the problem with these two graphs?
|
In other words, what is the problem with these two graphs?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| fig-show: hide
|
||||||
|
|
||||||
ggplot(diamonds, aes(x = cut, y = after_stat(prop))) +
|
ggplot(diamonds, aes(x = cut, y = after_stat(prop))) +
|
||||||
geom_bar()
|
geom_bar()
|
||||||
|
@ -785,7 +781,7 @@ You can color a bar chart using either the `color` aesthetic, or, more usefully,
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| layout-ncol: 2
|
#| layout-ncol: 2
|
||||||
#| fig-width: 4
|
#| fig-width: 5.5
|
||||||
#| fig-height: 2
|
#| fig-height: 2
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| Two bar charts of cut of diamonds. In the first plot, the bars have colored
|
#| Two bar charts of cut of diamonds. In the first plot, the bars have colored
|
||||||
|
@ -822,7 +818,7 @@ If you don't want a stacked bar chart, you can use one of three other options: `
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| layout-ncol: 2
|
#| layout-ncol: 2
|
||||||
#| fig-width: 4
|
#| fig-width: 5.5
|
||||||
#| fig-height: 2
|
#| fig-height: 2
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| Two segmented bar charts of cut of diamonds, where each bar is filled
|
#| Two segmented bar charts of cut of diamonds, where each bar is filled
|
||||||
|
@ -844,28 +840,26 @@ If you don't want a stacked bar chart, you can use one of three other options: `
|
||||||
- `position = "fill"` works like stacking, but makes each set of stacked bars the same height.
|
- `position = "fill"` works like stacking, but makes each set of stacked bars the same height.
|
||||||
This makes it easier to compare proportions across groups.
|
This makes it easier to compare proportions across groups.
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| fig-alt: >
|
|
||||||
#| Segmented bar chart of cut of diamonds, where each bar is filled with
|
|
||||||
#| colors for the levels of clarity. Height of each bar is 1 and heights
|
|
||||||
#| of the colored segments are proportional to the proportion of diamonds
|
|
||||||
#| with a given clarity level within a given cut level.
|
|
||||||
|
|
||||||
ggplot(diamonds, aes(x = cut, fill = clarity)) +
|
|
||||||
geom_bar(position = "fill")
|
|
||||||
```
|
|
||||||
|
|
||||||
- `position = "dodge"` places overlapping objects directly *beside* one another.
|
- `position = "dodge"` places overlapping objects directly *beside* one another.
|
||||||
This makes it easier to compare individual values.
|
This makes it easier to compare individual values.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
#| layout-ncol: 2
|
||||||
|
#| fig-width: 5.5
|
||||||
|
#| fig-height: 2
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| Dodged bar chart of cut of diamonds. Dodged bars are grouped by levels
|
#| On the left, segmented bar chart of cut of diamonds, where each bar is filled with
|
||||||
|
#| colors for the levels of clarity. Height of each bar is 1 and heights
|
||||||
|
#| of the colored segments are proportional to the proportion of diamonds
|
||||||
|
#| with a given clarity level within a given cut level.
|
||||||
|
#| On the right, dodged bar chart of cut of diamonds. Dodged bars are grouped by levels
|
||||||
#| of cut (fair, good, very good, premium, and ideal). In each group there
|
#| of cut (fair, good, very good, premium, and ideal). In each group there
|
||||||
#| are eight bars, one for each level of clarity, and filled with a
|
#| are eight bars, one for each level of clarity, and filled with a
|
||||||
#| different color for each level. Heights of these bars represent the
|
#| different color for each level. Heights of these bars represent the
|
||||||
#| number of diamonds with a given level of cut and clarity.
|
#| number of diamonds with a given level of cut and clarity.
|
||||||
|
|
||||||
|
ggplot(diamonds, aes(x = cut, fill = clarity)) +
|
||||||
|
geom_bar(position = "fill")
|
||||||
ggplot(diamonds, aes(x = cut, fill = clarity)) +
|
ggplot(diamonds, aes(x = cut, fill = clarity)) +
|
||||||
geom_bar(position = "dodge")
|
geom_bar(position = "dodge")
|
||||||
```
|
```
|
||||||
|
@ -909,14 +903,11 @@ To learn more about a position adjustment, look up the help page associated with
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
1. What is the problem with this plot?
|
1. What is the problem with the following plot?
|
||||||
How could you improve it?
|
How could you improve it?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-show: hide
|
||||||
#| Scatterplot of highway fuel efficiency versus city fuel efficiency
|
|
||||||
#| of cars that shows a positive association. The number of points
|
|
||||||
#| visible in this plot is less than the number of points in the dataset.
|
|
||||||
|
|
||||||
ggplot(mpg, aes(x = cty, y = hwy)) +
|
ggplot(mpg, aes(x = cty, y = hwy)) +
|
||||||
geom_point()
|
geom_point()
|
||||||
|
@ -988,16 +979,12 @@ There are two other coordinate systems that are occasionally helpful.
|
||||||
|
|
||||||
2. What's the difference between `coord_quickmap()` and `coord_map()`?
|
2. What's the difference between `coord_quickmap()` and `coord_map()`?
|
||||||
|
|
||||||
3. What does the plot below tell you about the relationship between city and highway mpg?
|
3. What does the following plot tell you about the relationship between city and highway mpg?
|
||||||
Why is `coord_fixed()` important?
|
Why is `coord_fixed()` important?
|
||||||
What does `geom_abline()` do?
|
What does `geom_abline()` do?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-show: hide
|
||||||
#| Scatterplot of highway fuel efficiency versus engine size of cars that
|
|
||||||
#| shows a negative association. The plot also has a straight line that
|
|
||||||
#| follows the trend of the relationship between the variables but does not
|
|
||||||
#| go through the cloud of points, it is beneath it.
|
|
||||||
|
|
||||||
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
|
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
|
||||||
geom_point() +
|
geom_point() +
|
||||||
|
|
|
@ -1,66 +0,0 @@
|
||||||
# Quarto workflow {#sec-quarto-workflow}
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| results: "asis"
|
|
||||||
#| echo: false
|
|
||||||
source("_common.R")
|
|
||||||
status("complete")
|
|
||||||
```
|
|
||||||
|
|
||||||
Earlier, we discussed a basic workflow for capturing your R code where you work interactively in the *console*, then capture what works in the *script editor*.
|
|
||||||
Quarto brings together the console and the script editor, blurring the lines between interactive exploration and long-term code capture.
|
|
||||||
You can rapidly iterate within a chunk, editing and re-executing with Cmd/Ctrl + Shift + Enter.
|
|
||||||
When you're happy, you move on and start a new chunk.
|
|
||||||
|
|
||||||
Quarto is also important because it so tightly integrates prose and code.
|
|
||||||
This makes it a great **analysis notebook** because it lets you develop code and record your thoughts.
|
|
||||||
An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences.
|
|
||||||
It:
|
|
||||||
|
|
||||||
- Records what you did and why you did it.
|
|
||||||
Regardless of how great your memory is, if you don't record what you do, there will come a time when you have forgotten important details.
|
|
||||||
Write them down so you don't forget!
|
|
||||||
|
|
||||||
- Supports rigorous thinking.
|
|
||||||
You are more likely to come up with a strong analysis if you record your thoughts as you go, and continue to reflect on them.
|
|
||||||
This also saves you time when you eventually write up your analysis to share with others.
|
|
||||||
|
|
||||||
- Helps others understand your work.
|
|
||||||
It is rare to do data analysis by yourself, and you'll often be working as part of a team.
|
|
||||||
A lab notebook helps you share not only what you've done, but why you did it with your colleagues or lab mates.
|
|
||||||
|
|
||||||
Much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks.
|
|
||||||
We've drawn on our own experiences and Colin Purrington's advice on lab notebooks (<https://colinpurrington.com/tips/lab-notebooks>) to come up with the following tips:
|
|
||||||
|
|
||||||
- Ensure each notebook has a descriptive title, an evocative file name, and a first paragraph that briefly describes the aims of the analysis.
|
|
||||||
|
|
||||||
- Use the YAML header date field to record the date you started working on the notebook:
|
|
||||||
|
|
||||||
``` yaml
|
|
||||||
date: 2016-08-23
|
|
||||||
```
|
|
||||||
|
|
||||||
Use ISO8601 YYYY-MM-DD format so that's there no ambiguity.
|
|
||||||
Use it even if you don't normally write dates that way!
|
|
||||||
|
|
||||||
- If you spend a lot of time on an analysis idea and it turns out to be a dead end, don't delete it!
|
|
||||||
Write up a brief note about why it failed and leave it in the notebook.
|
|
||||||
That will help you avoid going down the same dead end when you come back to the analysis in the future.
|
|
||||||
|
|
||||||
- Generally, you're better off doing data entry outside of R.
|
|
||||||
But if you do need to record a small snippet of data, clearly lay it out using `tibble::tribble()`.
|
|
||||||
|
|
||||||
- If you discover an error in a data file, never modify it directly, but instead write code to correct the value.
|
|
||||||
Explain why you made the fix.
|
|
||||||
|
|
||||||
- Before you finish for the day, make sure you can render the notebook.
|
|
||||||
If you're using caching, make sure to clear the caches.
|
|
||||||
That will let you fix any problems while the code is still fresh in your mind.
|
|
||||||
|
|
||||||
- If you want your code to be reproducible in the long-run (i.e. so you can come back to run it next month or next year), you'll need to track the versions of the packages that your code uses.
|
|
||||||
A rigorous approach is to use **renv**, <https://rstudio.github.io/renv/index.html>, which stores packages in your project directory.
|
|
||||||
A quick and dirty hack is to include a chunk that runs `sessionInfo()` --- that won't let you easily recreate your packages as they are today, but at least you'll know what they were.
|
|
||||||
|
|
||||||
- You are going to create many, many, many analysis notebooks over the course of your career.
|
|
||||||
How are you going to organize them so you can find them again in the future?
|
|
||||||
We recommend storing them in individual projects, and coming up with a good naming scheme.
|
|
66
quarto.qmd
66
quarto.qmd
|
@ -447,6 +447,7 @@ plot <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
|
||||||
```{r}
|
```{r}
|
||||||
#| echo: false
|
#| echo: false
|
||||||
#| fig-width: 4
|
#| fig-width: 4
|
||||||
|
#| out-width: "50%"
|
||||||
|
|
||||||
plot
|
plot
|
||||||
```
|
```
|
||||||
|
@ -454,6 +455,7 @@ plot
|
||||||
```{r}
|
```{r}
|
||||||
#| echo: false
|
#| echo: false
|
||||||
#| fig-width: 6
|
#| fig-width: 6
|
||||||
|
#| out-width: "50%"
|
||||||
|
|
||||||
plot
|
plot
|
||||||
```
|
```
|
||||||
|
@ -461,6 +463,7 @@ plot
|
||||||
```{r}
|
```{r}
|
||||||
#| echo: false
|
#| echo: false
|
||||||
#| fig-width: 8
|
#| fig-width: 8
|
||||||
|
#| out-width: "50%"
|
||||||
|
|
||||||
plot
|
plot
|
||||||
```
|
```
|
||||||
|
@ -515,9 +518,6 @@ Read the documentation for `?knitr::kable` to see the other ways in which you ca
|
||||||
For even deeper customization, consider the **gt**, **huxtable**, **reactable**, **kableExtra**, **xtable**, **stargazer**, **pander**, **tables**, and **ascii** packages.
|
For even deeper customization, consider the **gt**, **huxtable**, **reactable**, **kableExtra**, **xtable**, **stargazer**, **pander**, **tables**, and **ascii** packages.
|
||||||
Each provides a set of tools for returning formatted tables from R code.
|
Each provides a set of tools for returning formatted tables from R code.
|
||||||
|
|
||||||
There is also a rich set of options for controlling how figures are embedded.
|
|
||||||
You'll learn about these in @sec-graphics-communication.
|
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
<!--# TO DO: Add exercises -->
|
<!--# TO DO: Add exercises -->
|
||||||
|
@ -737,6 +737,66 @@ As with the bibliography field, your csl file should contain a path to the file.
|
||||||
Here we assume that the csl file is in the same directory as the .qmd file.
|
Here we assume that the csl file is in the same directory as the .qmd file.
|
||||||
A good place to find CSL style files for common bibliography styles is <https://github.com/citation-style-language/styles>.
|
A good place to find CSL style files for common bibliography styles is <https://github.com/citation-style-language/styles>.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
Earlier, we discussed a basic workflow for capturing your R code where you work interactively in the *console*, then capture what works in the *script editor*.
|
||||||
|
Quarto brings together the console and the script editor, blurring the lines between interactive exploration and long-term code capture.
|
||||||
|
You can rapidly iterate within a chunk, editing and re-executing with Cmd/Ctrl + Shift + Enter.
|
||||||
|
When you're happy, you move on and start a new chunk.
|
||||||
|
|
||||||
|
Quarto is also important because it so tightly integrates prose and code.
|
||||||
|
This makes it a great **analysis notebook** because it lets you develop code and record your thoughts.
|
||||||
|
An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences.
|
||||||
|
It:
|
||||||
|
|
||||||
|
- Records what you did and why you did it.
|
||||||
|
Regardless of how great your memory is, if you don't record what you do, there will come a time when you have forgotten important details.
|
||||||
|
Write them down so you don't forget!
|
||||||
|
|
||||||
|
- Supports rigorous thinking.
|
||||||
|
You are more likely to come up with a strong analysis if you record your thoughts as you go, and continue to reflect on them.
|
||||||
|
This also saves you time when you eventually write up your analysis to share with others.
|
||||||
|
|
||||||
|
- Helps others understand your work.
|
||||||
|
It is rare to do data analysis by yourself, and you'll often be working as part of a team.
|
||||||
|
A lab notebook helps you share not only what you've done, but why you did it with your colleagues or lab mates.
|
||||||
|
|
||||||
|
Much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks.
|
||||||
|
We've drawn on our own experiences and Colin Purrington's advice on lab notebooks (<https://colinpurrington.com/tips/lab-notebooks>) to come up with the following tips:
|
||||||
|
|
||||||
|
- Ensure each notebook has a descriptive title, an evocative file name, and a first paragraph that briefly describes the aims of the analysis.
|
||||||
|
|
||||||
|
- Use the YAML header date field to record the date you started working on the notebook:
|
||||||
|
|
||||||
|
``` yaml
|
||||||
|
date: 2016-08-23
|
||||||
|
```
|
||||||
|
|
||||||
|
Use ISO8601 YYYY-MM-DD format so that's there no ambiguity.
|
||||||
|
Use it even if you don't normally write dates that way!
|
||||||
|
|
||||||
|
- If you spend a lot of time on an analysis idea and it turns out to be a dead end, don't delete it!
|
||||||
|
Write up a brief note about why it failed and leave it in the notebook.
|
||||||
|
That will help you avoid going down the same dead end when you come back to the analysis in the future.
|
||||||
|
|
||||||
|
- Generally, you're better off doing data entry outside of R.
|
||||||
|
But if you do need to record a small snippet of data, clearly lay it out using `tibble::tribble()`.
|
||||||
|
|
||||||
|
- If you discover an error in a data file, never modify it directly, but instead write code to correct the value.
|
||||||
|
Explain why you made the fix.
|
||||||
|
|
||||||
|
- Before you finish for the day, make sure you can render the notebook.
|
||||||
|
If you're using caching, make sure to clear the caches.
|
||||||
|
That will let you fix any problems while the code is still fresh in your mind.
|
||||||
|
|
||||||
|
- If you want your code to be reproducible in the long-run (i.e. so you can come back to run it next month or next year), you'll need to track the versions of the packages that your code uses.
|
||||||
|
A rigorous approach is to use **renv**, <https://rstudio.github.io/renv/index.html>, which stores packages in your project directory.
|
||||||
|
A quick and dirty hack is to include a chunk that runs `sessionInfo()` --- that won't let you easily recreate your packages as they are today, but at least you'll know what they were.
|
||||||
|
|
||||||
|
- You are going to create many, many, many analysis notebooks over the course of your career.
|
||||||
|
How are you going to organize them so you can find them again in the future?
|
||||||
|
We recommend storing them in individual projects, and coming up with a good naming scheme.
|
||||||
|
|
||||||
## Learning more
|
## Learning more
|
||||||
|
|
||||||
Quarto is still relatively young, and is still growing rapidly.
|
Quarto is still relatively young, and is still growing rapidly.
|
||||||
|
|
|
@ -1,6 +1,10 @@
|
||||||
## Text formatting
|
## Text formatting
|
||||||
|
|
||||||
*italic* **bold** [underline]{.underline} ~~strikeout~~ [small caps]{.smallcaps} `code` superscript^2^ and subscript~2~
|
*italic* **bold** ~~strikeout~~ `code`
|
||||||
|
|
||||||
|
superscript^2^ subscript~2~
|
||||||
|
|
||||||
|
[underline]{.underline} [small caps]{.smallcaps}
|
||||||
|
|
||||||
## Headings
|
## Headings
|
||||||
|
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 794 KiB |
|
@ -55,6 +55,7 @@ For the rest of the chapter we will focus on using `read_excel()`.
|
||||||
```{r}
|
```{r}
|
||||||
#| label: fig-students-excel
|
#| label: fig-students-excel
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
#| fig-width: 5
|
||||||
#| fig-cap: >
|
#| fig-cap: >
|
||||||
#| Spreadsheet called students.xlsx in Excel.
|
#| Spreadsheet called students.xlsx in Excel.
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
|
@ -386,6 +387,7 @@ These can be turned off by setting `col_names` and `format_headers` arguments to
|
||||||
```{r}
|
```{r}
|
||||||
#| label: fig-bake-sale-excel
|
#| label: fig-bake-sale-excel
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
#| fig-width: 5
|
||||||
#| fig-cap: >
|
#| fig-cap: >
|
||||||
#| Spreadsheet called bake_sale.xlsx in Excel.
|
#| Spreadsheet called bake_sale.xlsx in Excel.
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
|
@ -405,84 +407,22 @@ read_excel("data/bake-sale.xlsx")
|
||||||
### Formatted output
|
### Formatted output
|
||||||
|
|
||||||
The writexl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
|
The writexl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package.
|
||||||
|
We won't go into the details of using this package here, but we recommend reading <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive discussion on further formatting functionality for data written from R to Excel with openxlsx.
|
||||||
|
|
||||||
Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar.
|
Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar.
|
||||||
For example, function names are camelCase, multiple functions can't be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse.
|
For example, function names are camelCase, multiple functions can't be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse.
|
||||||
However, this is ok.
|
However, this is ok.
|
||||||
As your R learning and usage expands outside of this book you will encounter lots of different styles used in various R packages that you might need to use to accomplish specific goals in R.
|
As your R learning and usage expands outside of this book you will encounter lots of different styles used in various R packages that you might use to accomplish specific goals in R.
|
||||||
A good way of familiarizing yourself with the coding style used in a new package is to run the examples provided in function documentation to get a feel for the syntax and the output formats as well as reading any vignettes that might come with the package.
|
A good way of familiarizing yourself with the coding style used in a new package is to run the examples provided in function documentation to get a feel for the syntax and the output formats as well as reading any vignettes that might come with the package.
|
||||||
|
|
||||||
Below we show how to write a spreadsheet with three sheets, one for each species of penguins in the `penguins` data frame.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| message: false
|
|
||||||
|
|
||||||
library(openxlsx)
|
|
||||||
library(palmerpenguins)
|
|
||||||
|
|
||||||
# Create a workbook (spreadsheet)
|
|
||||||
penguins_species <- createWorkbook()
|
|
||||||
|
|
||||||
# Add three sheets to the spreadsheet
|
|
||||||
addWorksheet(penguins_species, sheetName = "Adelie")
|
|
||||||
addWorksheet(penguins_species, sheetName = "Gentoo")
|
|
||||||
addWorksheet(penguins_species, sheetName = "Chinstrap")
|
|
||||||
|
|
||||||
# Write data to each sheet
|
|
||||||
writeDataTable(
|
|
||||||
penguins_species,
|
|
||||||
sheet = "Adelie",
|
|
||||||
x = penguins |> filter(species == "Adelie")
|
|
||||||
)
|
|
||||||
writeDataTable(
|
|
||||||
penguins_species,
|
|
||||||
sheet = "Gentoo",
|
|
||||||
x = penguins |> filter(species == "Gentoo")
|
|
||||||
)
|
|
||||||
writeDataTable(
|
|
||||||
penguins_species,
|
|
||||||
sheet = "Chinstrap",
|
|
||||||
x = penguins |> filter(species == "Chinstrap")
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
This creates a workbook object:
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
penguins_species
|
|
||||||
```
|
|
||||||
|
|
||||||
And we can write this to this with `saveWorkbook()`.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| eval: false
|
|
||||||
|
|
||||||
saveWorkbook(penguins_species, "data/penguins-species.xlsx")
|
|
||||||
```
|
|
||||||
|
|
||||||
The resulting spreadsheet is shown in @fig-penguins-species.
|
|
||||||
By default, openxlsx formats the data as an Excel table.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| label: fig-penguins-species
|
|
||||||
#| echo: false
|
|
||||||
#| fig-cap: >
|
|
||||||
#| Spreadsheet called penguins.xlsx in Excel.
|
|
||||||
#| fig-alt: >
|
|
||||||
#| A look at the penguins spreadsheet in Excel. The spreadsheet contains has
|
|
||||||
#| three sheets: Torgersen Island, Biscoe Island, and Dream Island.
|
|
||||||
|
|
||||||
knitr::include_graphics("screenshots/import-spreadsheets-penguins-species.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
See <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive discussion on further formatting functionality for data written from R to Excel with openxlsx.
|
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
1. In an Excel file, create the following dataset and save it as `survey.xlsx`.
|
1. In an Excel file, create the following dataset and save it as `survey.xlsx`.
|
||||||
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1yc5gL-a2OOBr8M7B3IsDNX5uR17vBHOyWZq6xSTG2G8/edit?usp=sharing).
|
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1yc5gL-a2OOBr8M7B3IsDNX5uR17vBHOyWZq6xSTG2G8).
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
#| fig-width: 4
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| A spreadsheet with 3 columns (group, subgroup, and id) and 12 rows.
|
#| A spreadsheet with 3 columns (group, subgroup, and id) and 12 rows.
|
||||||
#| The group column has two values: 1 (spanning 7 merged rows) and 2
|
#| The group column has two values: 1 (spanning 7 merged rows) and 2
|
||||||
|
@ -512,10 +452,11 @@ See <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive
|
||||||
```
|
```
|
||||||
|
|
||||||
2. In another Excel file, create the following dataset and save it as `roster.xlsx`.
|
2. In another Excel file, create the following dataset and save it as `roster.xlsx`.
|
||||||
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE/edit?usp=sharing).
|
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE).
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
#| fig-width: 4
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| A spreadsheet with 3 columns (group, subgroup, and id) and 12 rows. The
|
#| A spreadsheet with 3 columns (group, subgroup, and id) and 12 rows. The
|
||||||
#| group column has two values: 1 (spanning 7 merged rows) and 2 (spanning
|
#| group column has two values: 1 (spanning 7 merged rows) and 2 (spanning
|
||||||
|
@ -540,7 +481,7 @@ See <https://ycphs.github.io/openxlsx/articles/Formatting.html> for an extensive
|
||||||
```
|
```
|
||||||
|
|
||||||
3. In a new Excel file, create the following dataset and save it as `sales.xlsx`.
|
3. In a new Excel file, create the following dataset and save it as `sales.xlsx`.
|
||||||
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1oCqdXUNO8JR3Pca8fHfiz_WXWxMuZAp3YiYFaKze5V0/edit?usp=sharing).
|
Alternatively, you can download it as an Excel file from [here](https://docs.google.com/spreadsheets/d/1oCqdXUNO8JR3Pca8fHfiz_WXWxMuZAp3YiYFaKze5V0).
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| echo: false
|
#| echo: false
|
||||||
|
@ -647,7 +588,8 @@ gs4_deauth()
|
||||||
```
|
```
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
students <- read_sheet("https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/edit?usp=sharing")
|
students_url <- "https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w"
|
||||||
|
students <- read_sheet(students_url)
|
||||||
```
|
```
|
||||||
|
|
||||||
`read_sheet()` will read the file in as a tibble.
|
`read_sheet()` will read the file in as a tibble.
|
||||||
|
@ -660,7 +602,7 @@ Just like we did with `read_excel()`, we can supply column names, NA strings, an
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
students <- read_sheet(
|
students <- read_sheet(
|
||||||
"https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/edit?usp=sharing",
|
students_url,
|
||||||
col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
|
col_names = c("student_id", "full_name", "favourite_food", "meal_plan", "age"),
|
||||||
skip = 1,
|
skip = 1,
|
||||||
na = c("", "N/A"),
|
na = c("", "N/A"),
|
||||||
|
@ -681,13 +623,14 @@ It's also possible to read individual sheets from Google Sheets as well.
|
||||||
Let's read the penguins Google Sheet at <https://pos.it/r4ds-penguins>, and specifically the "Torgersen Island" sheet in it.
|
Let's read the penguins Google Sheet at <https://pos.it/r4ds-penguins>, and specifically the "Torgersen Island" sheet in it.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
read_sheet("https://docs.google.com/spreadsheets/d/1aFu8lnD_g0yjF5O-K6SFgSEWiHPpgvFCF0NY9D6LXnY/edit?usp=sharing", sheet = "Torgersen Island")
|
penguins_url <- "https://docs.google.com/spreadsheets/d/1aFu8lnD_g0yjF5O-K6SFgSEWiHPpgvFCF0NY9D6LXnY"
|
||||||
|
read_sheet(penguins_url, sheet = "Torgersen Island")
|
||||||
```
|
```
|
||||||
|
|
||||||
You can obtain a list of all sheets within a Google Sheet with `sheet_names()`:
|
You can obtain a list of all sheets within a Google Sheet with `sheet_names()`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
sheet_names("https://docs.google.com/spreadsheets/d/1aFu8lnD_g0yjF5O-K6SFgSEWiHPpgvFCF0NY9D6LXnY/edit?usp=sharing")
|
sheet_names(penguins_url)
|
||||||
```
|
```
|
||||||
|
|
||||||
Finally, just like with `read_excel()`, we can read in a portion of a Google Sheet by defining a `range` in `read_sheet()`.
|
Finally, just like with `read_excel()`, we can read in a portion of a Google Sheet by defining a `range` in `read_sheet()`.
|
||||||
|
@ -740,7 +683,7 @@ For further authentication details, we recommend reading the documentation googl
|
||||||
#| echo: false
|
#| echo: false
|
||||||
#| message: false
|
#| message: false
|
||||||
|
|
||||||
read_sheet("https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE/edit#gid=0") |>
|
read_sheet("https://docs.google.com/spreadsheets/d/1LgZ0Bkg9d_NK8uTdP2uHXm07kAlwx8-Ictf8NocebIE/") |>
|
||||||
fill(group, subgroup) |>
|
fill(group, subgroup) |>
|
||||||
print(n = 12)
|
print(n = 12)
|
||||||
```
|
```
|
||||||
|
|
Loading…
Reference in New Issue