Use dev tidyverse (#1240)
* lubridate is now core * briefly mention conflicted Fixes #1105
This commit is contained in:
parent
a3b40d8dd8
commit
d3a9919967
|
@ -49,6 +49,7 @@ Remotes:
|
|||
tidyverse/dplyr,
|
||||
tidyverse/dbplyr,
|
||||
tidyverse/tidyr,
|
||||
tidyverse/purrr
|
||||
tidyverse/purrr,
|
||||
tidyverse/tidyverse
|
||||
Encoding: UTF-8
|
||||
License: CC NC ND 3.0
|
||||
|
|
|
@ -24,7 +24,7 @@ We'll finish off with saving your plots and troubleshooting tips.
|
|||
### Prerequisites
|
||||
|
||||
This chapter focuses on ggplot2, one of the core packages in the tidyverse.
|
||||
To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running this code:
|
||||
To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running:
|
||||
|
||||
```{r}
|
||||
#| label: setup
|
||||
|
@ -32,8 +32,11 @@ To access the datasets, help pages, and functions used in this chapter, load the
|
|||
library(tidyverse)
|
||||
```
|
||||
|
||||
That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.
|
||||
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).
|
||||
That one line of code loads the core tidyverse; the packages that you will use in almost every data analysis.
|
||||
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded)[^data-visualize-1].
|
||||
|
||||
[^data-visualize-1]: You can eliminate that message and force conflict resolution to happen on demand by using the conflicted package, which becomes more important as you load more packages.
|
||||
You can learn more about conflicted at <https://conflicted.r-lib.org>.
|
||||
|
||||
If you run this code and get the error message `there is no package called 'tidyverse'`, you'll need to first install it, then run `library()` once again.
|
||||
|
||||
|
@ -44,7 +47,7 @@ install.packages("tidyverse")
|
|||
library(tidyverse)
|
||||
```
|
||||
|
||||
You only need to install a package once, but you need to reload it every time you start a new session.
|
||||
You only need to install a package once, but you need to load it every time you start a new session.
|
||||
|
||||
In addition to tidyverse, we will also use the **palmerpenguins** package, which includes the `penguins` dataset containing body measurements for penguins on three islands in the Palmer Archipelago.
|
||||
|
||||
|
@ -68,9 +71,9 @@ And how about by the island where the penguin lives.
|
|||
|
||||
You can test your answer with the `penguins` **data frame** found in palmerpenguins (a.k.a. `palmerpenguins::penguins`).
|
||||
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows).
|
||||
`penguins` contains `r nrow(penguins)` observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER[^data-visualize-1].
|
||||
`penguins` contains `r nrow(penguins)` observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER[^data-visualize-2].
|
||||
|
||||
[^data-visualize-1]: Horst AM, Hill AP, Gorman KB (2020).
|
||||
[^data-visualize-2]: Horst AM, Hill AP, Gorman KB (2020).
|
||||
palmerpenguins: Palmer Archipelago (Antarctica) penguin data.
|
||||
R package version 0.1.0.
|
||||
<https://allisonhorst.github.io/palmerpenguins/>.
|
||||
|
@ -741,10 +744,10 @@ However adding too many aesthetic mappings to a plot makes it cluttered and diff
|
|||
Another way, which is particularly useful for categorical variables, is to split your plot into **facets**, subplots that each display one subset of the data.
|
||||
|
||||
To facet your plot by a single variable, use `facet_wrap()`.
|
||||
The first argument of `facet_wrap()` is a formula[^data-visualize-2], which you create with `~` followed by a variable name.
|
||||
The first argument of `facet_wrap()` is a formula[^data-visualize-3], which you create with `~` followed by a variable name.
|
||||
The variable that you pass to `facet_wrap()` should be categorical.
|
||||
|
||||
[^data-visualize-2]: Here "formula" is the name of the type of thing created by `~`, not a synonym for "equation".
|
||||
[^data-visualize-3]: Here "formula" is the name of the type of thing created by `~`, not a synonym for "equation".
|
||||
|
||||
```{r}
|
||||
#| warning: false
|
||||
|
|
|
@ -35,14 +35,13 @@ We'll conclude with a brief discussion of the additional challenges posed by tim
|
|||
### Prerequisites
|
||||
|
||||
This chapter will focus on the **lubridate** package, which makes it easier to work with dates and times in R.
|
||||
lubridate is not part of core tidyverse because you only need it when you're working with dates/times.
|
||||
As of the latest tidyverse release, lubridate is part of core tidyverse so.
|
||||
We will also need nycflights13 for practice data.
|
||||
|
||||
```{r}
|
||||
#| message: false
|
||||
library(tidyverse)
|
||||
|
||||
library(lubridate)
|
||||
library(nycflights13)
|
||||
```
|
||||
|
||||
|
|
|
@ -200,7 +200,7 @@ Once you have installed a package, you can load it using the `library()` functio
|
|||
library(tidyverse)
|
||||
```
|
||||
|
||||
This tells you that tidyverse loads eight packages: ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, and forcats.
|
||||
This tells you that tidyverse loads nine packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, tidyr.
|
||||
These are considered the **core** of the tidyverse because you'll use them in almost every analysis.
|
||||
|
||||
Packages in the tidyverse change fairly frequently.
|
||||
|
|
|
@ -308,8 +308,6 @@ df_miss |> filter(if_all(a:d, is.na))
|
|||
For example, [Jacob Scott](https://twitter.com/_wurli/status/1571836746899283969) uses this little helper which wraps a bunch of lubridate function to expand all date columns into year, month, and day columns:
|
||||
|
||||
```{r}
|
||||
library(lubridate)
|
||||
|
||||
expand_dates <- function(df) {
|
||||
df |>
|
||||
mutate(
|
||||
|
@ -687,7 +685,8 @@ Now when you come back to this problem in the future, you can read in a single c
|
|||
unlink("gapminder.csv")
|
||||
```
|
||||
|
||||
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R`. The `0` in the file name suggests that this should be run before anything else.
|
||||
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R`.
|
||||
The `0` in the file name suggests that this should be run before anything else.
|
||||
|
||||
If your input data files change over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run whenever one of the input files is modified.
|
||||
|
||||
|
|
Loading…
Reference in New Issue