Minor edits during read thru of whole game viz (#1144)
* Add bit on visualize and Quarto * Minor edits during readthru * Fix typo
This commit is contained in:
parent
5e47710f81
commit
2983376224
|
@ -29,7 +29,7 @@ library(tidyverse)
|
||||||
That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.
|
That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.
|
||||||
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).
|
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).
|
||||||
|
|
||||||
If you run this code and get the error message "there is no package called 'tidyverse'", you'll need to first install it, then run `library()` once again.
|
If you run this code and get the error message `there is no package called 'tidyverse'`, you'll need to first install it, then run `library()` once again.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
|
@ -54,7 +54,7 @@ Nonlinear?
|
||||||
|
|
||||||
You can test your answer with the `mpg` **data frame** found in ggplot2 (a.k.a. `ggplot2::mpg`).
|
You can test your answer with the `mpg` **data frame** found in ggplot2 (a.k.a. `ggplot2::mpg`).
|
||||||
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows).
|
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows).
|
||||||
`mpg` contains observations collected by the US Environmental Protection Agency on 38 car models.
|
`mpg` contains `r nrow(mpg)` observations collected by the US Environmental Protection Agency on `r mpg |> distinct(model) |> nrow()` car models.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
mpg
|
mpg
|
||||||
|
@ -62,16 +62,16 @@ mpg
|
||||||
|
|
||||||
Among the variables in `mpg` are:
|
Among the variables in `mpg` are:
|
||||||
|
|
||||||
1. `displ`, a car's engine size, in liters.
|
1. `displ`: a car's engine size, in liters.
|
||||||
|
|
||||||
2. `hwy`, a car's fuel efficiency on the highway, in miles per gallon (mpg).
|
2. `hwy`: a car's fuel efficiency on the highway, in miles per gallon (mpg).
|
||||||
A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance.
|
A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance.
|
||||||
|
|
||||||
To learn more about `mpg`, open its help page by running `?mpg`.
|
To learn more about `mpg`, open its help page by running `?mpg`.
|
||||||
|
|
||||||
### Creating a ggplot
|
### Creating a ggplot
|
||||||
|
|
||||||
To plot `mpg`, run this code to put `displ` on the x-axis and `hwy` on the y-axis:
|
To plot `mpg`, run this code to put `displ` on the x-axis, `hwy` on the y-axis, and represent each observation with a point:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
|
@ -88,6 +88,7 @@ Does this confirm or refute your hypothesis about fuel efficiency and engine siz
|
||||||
|
|
||||||
With ggplot2, you begin a plot with the function `ggplot()`.
|
With ggplot2, you begin a plot with the function `ggplot()`.
|
||||||
`ggplot()` creates a coordinate system that you can add layers to.
|
`ggplot()` creates a coordinate system that you can add layers to.
|
||||||
|
You can think of it like an empty canvas you'll paint the rest of your plot on, layer by layer.
|
||||||
The first argument of `ggplot()` is the dataset to use in the graph.
|
The first argument of `ggplot()` is the dataset to use in the graph.
|
||||||
So `ggplot(data = mpg)` creates an empty graph, but it's not very interesting so we won't show it here.
|
So `ggplot(data = mpg)` creates an empty graph, but it's not very interesting so we won't show it here.
|
||||||
|
|
||||||
|
@ -151,7 +152,8 @@ How can you explain these cars?
|
||||||
|
|
||||||
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||||
geom_point() +
|
geom_point() +
|
||||||
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 2.2)
|
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 1.6) +
|
||||||
|
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 3, shape = "circle open")
|
||||||
```
|
```
|
||||||
|
|
||||||
Let's hypothesize that the cars are hybrids.
|
Let's hypothesize that the cars are hybrids.
|
||||||
|
@ -211,7 +213,7 @@ These cars don't seem like hybrids, and are, in fact, sports cars!
|
||||||
Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage.
|
Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage.
|
||||||
In hindsight, these cars were unlikely to be hybrids since they have large engines.
|
In hindsight, these cars were unlikely to be hybrids since they have large engines.
|
||||||
|
|
||||||
In the above example, we mapped `class` to the color aesthetic, but we could have mapped `class` to the size aesthetic in the same way.
|
In the above example, we mapped `class` to the `color` aesthetic, but we could have mapped `class` to the `size` aesthetic in the same way.
|
||||||
In this case, the exact size of each point would reveal its class affiliation.
|
In this case, the exact size of each point would reveal its class affiliation.
|
||||||
We get a *warning* here: mapping an unordered variable (`class`) to an ordered aesthetic (`size`) is generally not a good idea because it implies a ranking that does not in fact exist.
|
We get a *warning* here: mapping an unordered variable (`class`) to an ordered aesthetic (`size`) is generally not a good idea because it implies a ranking that does not in fact exist.
|
||||||
|
|
||||||
|
@ -227,7 +229,7 @@ ggplot(data = mpg) +
|
||||||
geom_point(mapping = aes(x = displ, y = hwy, size = class))
|
geom_point(mapping = aes(x = displ, y = hwy, size = class))
|
||||||
```
|
```
|
||||||
|
|
||||||
Similarly, we could have mapped `class` to the *alpha* aesthetic, which controls the transparency of the points, or to the *shape* aesthetic, which controls the shape of the points.
|
Similarly, we could have mapped `class` to the `alpha` aesthetic, which controls the transparency of the points, or to the `shape` aesthetic, which controls the shape of the points.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| layout-ncol: 2
|
#| layout-ncol: 2
|
||||||
|
@ -329,8 +331,7 @@ ggplot(shapes, aes(x, y)) +
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
1. What's gone wrong with this code?
|
1. Why did the following code not result in a plot with blue points?
|
||||||
Why are the points not blue?
|
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
|
@ -386,7 +387,7 @@ Don't worry if the help doesn't seem that helpful - instead skip down to the exa
|
||||||
|
|
||||||
If that doesn't help, carefully read the error message.
|
If that doesn't help, carefully read the error message.
|
||||||
Sometimes the answer will be buried there!
|
Sometimes the answer will be buried there!
|
||||||
But when you're new to R, the answer might be in the error message but you don't yet know how to understand it.
|
But when you're new to R, even if the answer is in the error message, you might not yet know how to understand it.
|
||||||
Another great tool is Google: try googling the error message, as it's likely someone else has had the same problem, and has gotten help online.
|
Another great tool is Google: try googling the error message, as it's likely someone else has had the same problem, and has gotten help online.
|
||||||
|
|
||||||
## Facets
|
## Facets
|
||||||
|
|
|
@ -7,6 +7,8 @@ Welcome to the second edition of "R for Data Science".
|
||||||
- The first part is renamed to "whole game" to reflect the entire data science cycle.
|
- The first part is renamed to "whole game" to reflect the entire data science cycle.
|
||||||
It gains a new chapter that briefly introduces the basics of reading data from csv files.
|
It gains a new chapter that briefly introduces the basics of reading data from csv files.
|
||||||
|
|
||||||
|
- We've added a new part called visualize.
|
||||||
|
|
||||||
- The wrangle part is now transform and gains new chapters on numbers, logical vectors, and missing values.
|
- The wrangle part is now transform and gains new chapters on numbers, logical vectors, and missing values.
|
||||||
These were previously parts of the data transformation chapter, but needed much more room.
|
These were previously parts of the data transformation chapter, but needed much more room.
|
||||||
|
|
||||||
|
@ -19,6 +21,8 @@ Welcome to the second edition of "R for Data Science".
|
||||||
|
|
||||||
- We've switched from the magrittr pipe to the base pipe.
|
- We've switched from the magrittr pipe to the base pipe.
|
||||||
|
|
||||||
|
- The communicate part now features writing computational documents with Quarto.
|
||||||
|
|
||||||
## Acknowledgements {.unnumbered}
|
## Acknowledgements {.unnumbered}
|
||||||
|
|
||||||
*TO DO: Add acknowledgements.*
|
*TO DO: Add acknowledgements.*
|
||||||
|
|
Loading…
Reference in New Issue