Tweak :: description and usage

This commit is contained in:
Hadley Wickham 2022-08-08 13:57:08 -05:00
parent f37e134d74
commit 0c05d01794
3 changed files with 11 additions and 10 deletions

16
EDA.qmd
View File

@ -112,7 +112,7 @@ ggplot(data = diamonds, mapping = aes(x = cut)) +
``` ```
The height of the bars displays how many observations occurred with each x value. The height of the bars displays how many observations occurred with each x value.
You can compute these values manually with `dplyr::count()`: You can compute these values manually with `count()`:
```{r} ```{r}
diamonds |> diamonds |>
@ -136,7 +136,7 @@ ggplot(data = diamonds, mapping = aes(x = carat)) +
geom_histogram(binwidth = 0.5) geom_histogram(binwidth = 0.5)
``` ```
You can compute this by hand by combining `dplyr::count()` and `ggplot2::cut_width()`: You can compute this by hand by combining `count()` and `cut_width()`:
```{r} ```{r}
diamonds |> diamonds |>
@ -359,17 +359,17 @@ If you've encountered unusual values in your dataset, and simply want to move on
2. Instead, I recommend replacing the unusual values with missing values. 2. Instead, I recommend replacing the unusual values with missing values.
The easiest way to do this is to use `mutate()` to replace the variable with a modified copy. The easiest way to do this is to use `mutate()` to replace the variable with a modified copy.
You can use the `ifelse()` function to replace unusual values with `NA`: You can use the `if_else()` function to replace unusual values with `NA`:
```{r} ```{r}
diamonds2 <- diamonds |> diamonds2 <- diamonds |>
mutate(y = ifelse(y < 3 | y > 20, NA, y)) mutate(y = if_else(y < 3 | y > 20, NA, y))
``` ```
`ifelse()` has three arguments. `if_else()` has three arguments.
The first argument `test` should be a logical vector. The first argument `test` should be a logical vector.
The result will contain the value of the second argument, `yes`, when `test` is `TRUE`, and the value of the third argument, `no`, when it is false. The result will contain the value of the second argument, `yes`, when `test` is `TRUE`, and the value of the third argument, `no`, when it is false.
Alternatively to `if_else()`, use `dplyr::case_when()`. Alternatively to `if_else()`, use `case_when()`.
`case_when()` is particularly useful inside mutate when you want to create a new variable that relies on a complex combination of existing variables or would otherwise require multiple `if_else()` statements nested inside one another. `case_when()` is particularly useful inside mutate when you want to create a new variable that relies on a complex combination of existing variables or would otherwise require multiple `if_else()` statements nested inside one another.
Like R, ggplot2 subscribes to the philosophy that missing values should never silently go missing. Like R, ggplot2 subscribes to the philosophy that missing values should never silently go missing.
@ -397,10 +397,12 @@ ggplot(data = diamonds2, mapping = aes(x = x, y = y)) +
``` ```
Other times you want to understand what makes observations with missing values different to observations with recorded values. Other times you want to understand what makes observations with missing values different to observations with recorded values.
For example, in `nycflights13::flights`, missing values in the `dep_time` variable indicate that the flight was cancelled. For example, in `nycflights13::flights`[^eda-1], missing values in the `dep_time` variable indicate that the flight was cancelled.
So you might want to compare the scheduled departure times for cancelled and non-cancelled times. So you might want to compare the scheduled departure times for cancelled and non-cancelled times.
You can do this by making a new variable with `is.na()`. You can do this by making a new variable with `is.na()`.
[^eda-1]: Remember that when need to be explicit about where a function (or dataset) comes from, we'll use the special form `package::function()` or `package::dataset`.
```{r} ```{r}
#| fig-alt: > #| fig-alt: >
#| A frequency polygon of scheduled departure times of flights. Two lines #| A frequency polygon of scheduled departure times of flights. Two lines

View File

@ -31,6 +31,8 @@ library(tidyverse)
Take careful note of the conflicts message that's printed when you load the tidyverse. Take careful note of the conflicts message that's printed when you load the tidyverse.
It tells you that dplyr overwrites some functions in base R. It tells you that dplyr overwrites some functions in base R.
If you want to use the base version of these functions after loading dplyr, you'll need to use their full names: `stats::filter()` and `stats::lag()`. If you want to use the base version of these functions after loading dplyr, you'll need to use their full names: `stats::filter()` and `stats::lag()`.
So far we've mostly ignored which package a function comes from because most of the time it doesn't matter.
However, knowing the package can help you find help and find related functions, so when we need to be precise about which function a package comes from, we'll use the same syntax as R: `packagename::functionname()`.
### nycflights13 ### nycflights13

View File

@ -42,9 +42,6 @@ library(tidyverse)
You only need to install a package once, but you need to reload it every time you start a new session. You only need to install a package once, but you need to reload it every time you start a new session.
If we need to be explicit about where a function (or dataset) comes from, we'll use the special form `package::function()`.
For example, `ggplot2::ggplot()` tells you explicitly that we're using the `ggplot()` function from the ggplot2 package.
## First steps ## First steps
Let's use our first graph to answer a question: Do cars with big engines use more fuel than cars with small engines? Let's use our first graph to answer a question: Do cars with big engines use more fuel than cars with small engines?