Tweak :: description and usage
This commit is contained in:
parent
f37e134d74
commit
0c05d01794
16
EDA.qmd
16
EDA.qmd
|
@ -112,7 +112,7 @@ ggplot(data = diamonds, mapping = aes(x = cut)) +
|
|||
```
|
||||
|
||||
The height of the bars displays how many observations occurred with each x value.
|
||||
You can compute these values manually with `dplyr::count()`:
|
||||
You can compute these values manually with `count()`:
|
||||
|
||||
```{r}
|
||||
diamonds |>
|
||||
|
@ -136,7 +136,7 @@ ggplot(data = diamonds, mapping = aes(x = carat)) +
|
|||
geom_histogram(binwidth = 0.5)
|
||||
```
|
||||
|
||||
You can compute this by hand by combining `dplyr::count()` and `ggplot2::cut_width()`:
|
||||
You can compute this by hand by combining `count()` and `cut_width()`:
|
||||
|
||||
```{r}
|
||||
diamonds |>
|
||||
|
@ -359,17 +359,17 @@ If you've encountered unusual values in your dataset, and simply want to move on
|
|||
|
||||
2. Instead, I recommend replacing the unusual values with missing values.
|
||||
The easiest way to do this is to use `mutate()` to replace the variable with a modified copy.
|
||||
You can use the `ifelse()` function to replace unusual values with `NA`:
|
||||
You can use the `if_else()` function to replace unusual values with `NA`:
|
||||
|
||||
```{r}
|
||||
diamonds2 <- diamonds |>
|
||||
mutate(y = ifelse(y < 3 | y > 20, NA, y))
|
||||
mutate(y = if_else(y < 3 | y > 20, NA, y))
|
||||
```
|
||||
|
||||
`ifelse()` has three arguments.
|
||||
`if_else()` has three arguments.
|
||||
The first argument `test` should be a logical vector.
|
||||
The result will contain the value of the second argument, `yes`, when `test` is `TRUE`, and the value of the third argument, `no`, when it is false.
|
||||
Alternatively to `if_else()`, use `dplyr::case_when()`.
|
||||
Alternatively to `if_else()`, use `case_when()`.
|
||||
`case_when()` is particularly useful inside mutate when you want to create a new variable that relies on a complex combination of existing variables or would otherwise require multiple `if_else()` statements nested inside one another.
|
||||
|
||||
Like R, ggplot2 subscribes to the philosophy that missing values should never silently go missing.
|
||||
|
@ -397,10 +397,12 @@ ggplot(data = diamonds2, mapping = aes(x = x, y = y)) +
|
|||
```
|
||||
|
||||
Other times you want to understand what makes observations with missing values different to observations with recorded values.
|
||||
For example, in `nycflights13::flights`, missing values in the `dep_time` variable indicate that the flight was cancelled.
|
||||
For example, in `nycflights13::flights`[^eda-1], missing values in the `dep_time` variable indicate that the flight was cancelled.
|
||||
So you might want to compare the scheduled departure times for cancelled and non-cancelled times.
|
||||
You can do this by making a new variable with `is.na()`.
|
||||
|
||||
[^eda-1]: Remember that when need to be explicit about where a function (or dataset) comes from, we'll use the special form `package::function()` or `package::dataset`.
|
||||
|
||||
```{r}
|
||||
#| fig-alt: >
|
||||
#| A frequency polygon of scheduled departure times of flights. Two lines
|
||||
|
|
|
@ -31,6 +31,8 @@ library(tidyverse)
|
|||
Take careful note of the conflicts message that's printed when you load the tidyverse.
|
||||
It tells you that dplyr overwrites some functions in base R.
|
||||
If you want to use the base version of these functions after loading dplyr, you'll need to use their full names: `stats::filter()` and `stats::lag()`.
|
||||
So far we've mostly ignored which package a function comes from because most of the time it doesn't matter.
|
||||
However, knowing the package can help you find help and find related functions, so when we need to be precise about which function a package comes from, we'll use the same syntax as R: `packagename::functionname()`.
|
||||
|
||||
### nycflights13
|
||||
|
||||
|
|
|
@ -42,9 +42,6 @@ library(tidyverse)
|
|||
|
||||
You only need to install a package once, but you need to reload it every time you start a new session.
|
||||
|
||||
If we need to be explicit about where a function (or dataset) comes from, we'll use the special form `package::function()`.
|
||||
For example, `ggplot2::ggplot()` tells you explicitly that we're using the `ggplot()` function from the ggplot2 package.
|
||||
|
||||
## First steps
|
||||
|
||||
Let's use our first graph to answer a question: Do cars with big engines use more fuel than cars with small engines?
|
||||
|
|
Loading…
Reference in New Issue