Update post map_df
This commit is contained in:
parent
1aea6351a9
commit
eaf25f50c6
15
lists.Rmd
15
lists.Rmd
|
@ -165,6 +165,7 @@ This is such a common use of for loops, that the purrr package has five function
|
|||
* `map_int()`: integer vector
|
||||
* `map_dbl()`: double vector
|
||||
* `map_chr()`: character vector
|
||||
* `map_df()`: a data frame
|
||||
|
||||
Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping:
|
||||
|
||||
|
@ -185,7 +186,6 @@ Other outputs:
|
|||
* `flatten()`
|
||||
* `map_int()` vs. `map()` + `flatten_int()`
|
||||
* `flatmap()`
|
||||
* `dplyr::bind_rows()`
|
||||
|
||||
Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and
|
||||
|
||||
|
@ -268,7 +268,6 @@ issues %>% map_chr(c("user", "login"))
|
|||
issues %>% map_int(c("user", "id"))
|
||||
```
|
||||
|
||||
|
||||
### Predicate functions
|
||||
|
||||
Imagine we want to summarise each numeric column of a data frame. We could write this:
|
||||
|
@ -340,14 +339,13 @@ x[error]
|
|||
y[!error] %>% map("result")
|
||||
```
|
||||
|
||||
Challenge: read_csv all the files in this directory. Which ones failed
|
||||
and why? Potentially helpful digression into names() and bind_rows(id
|
||||
= "xyz"):
|
||||
Challenge: read all the csv files in this directory. Which ones failed
|
||||
and why?
|
||||
|
||||
```{r, eval = FALSE}
|
||||
files <- dir("data", pattern = "\\.csv$")
|
||||
files %>%
|
||||
set_names(basename(.)) %>%
|
||||
set_names(., basename(.)) %>%
|
||||
map_df(readr::read_csv, .id = "filename") %>%
|
||||
```
|
||||
|
||||
|
@ -443,12 +441,12 @@ Then fit the models to each training dataset:
|
|||
mod <- trn %>% map(~lm(mpg ~ wt, data = .))
|
||||
```
|
||||
|
||||
If we wanted, we could extract the coefficients using broom, and make a single data frame with `bind_rows()` and then visualise the distributions with ggplot2:
|
||||
If we wanted, we could extract the coefficients using broom, and make a single data frame with `map_df()` and then visualise the distributions with ggplot2:
|
||||
|
||||
```{r}
|
||||
coef <- mod %>%
|
||||
map(broom::tidy) %>%
|
||||
dplyr::bind_rows(.id = "i")
|
||||
map_df(.id = "i")
|
||||
coef
|
||||
|
||||
library(ggplot2)
|
||||
|
@ -483,7 +481,6 @@ Why you should store related vectors (even if they're lists!) in a
|
|||
data frame. Need example that has some covariates so you can (e.g.)
|
||||
select all models for females, or under 30s, ...
|
||||
|
||||
|
||||
## "Tidying" lists
|
||||
|
||||
I don't know know how to put this stuff in words yet, but I know it
|
||||
|
|
Loading…
Reference in New Issue