Update post map_df

This commit is contained in:
hadley 2015-11-11 11:31:15 -06:00
parent 1aea6351a9
commit eaf25f50c6
1 changed files with 6 additions and 9 deletions

View File

@ -165,6 +165,7 @@ This is such a common use of for loops, that the purrr package has five function
* `map_int()`: integer vector
* `map_dbl()`: double vector
* `map_chr()`: character vector
* `map_df()`: a data frame
Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping:
@ -185,7 +186,6 @@ Other outputs:
* `flatten()`
* `map_int()` vs. `map()` + `flatten_int()`
* `flatmap()`
* `dplyr::bind_rows()`
Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and
@ -268,7 +268,6 @@ issues %>% map_chr(c("user", "login"))
issues %>% map_int(c("user", "id"))
```
### Predicate functions
Imagine we want to summarise each numeric column of a data frame. We could write this:
@ -340,14 +339,13 @@ x[error]
y[!error] %>% map("result")
```
Challenge: read_csv all the files in this directory. Which ones failed
and why? Potentially helpful digression into names() and bind_rows(id
= "xyz"):
Challenge: read all the csv files in this directory. Which ones failed
and why?
```{r, eval = FALSE}
files <- dir("data", pattern = "\\.csv$")
files %>%
set_names(basename(.)) %>%
set_names(., basename(.)) %>%
map_df(readr::read_csv, .id = "filename") %>%
```
@ -443,12 +441,12 @@ Then fit the models to each training dataset:
mod <- trn %>% map(~lm(mpg ~ wt, data = .))
```
If we wanted, we could extract the coefficients using broom, and make a single data frame with `bind_rows()` and then visualise the distributions with ggplot2:
If we wanted, we could extract the coefficients using broom, and make a single data frame with `map_df()` and then visualise the distributions with ggplot2:
```{r}
coef <- mod %>%
map(broom::tidy) %>%
dplyr::bind_rows(.id = "i")
map_df(.id = "i")
coef
library(ggplot2)
@ -483,7 +481,6 @@ Why you should store related vectors (even if they're lists!) in a
data frame. Need example that has some covariates so you can (e.g.)
select all models for females, or under 30s, ...
## "Tidying" lists
I don't know know how to put this stuff in words yet, but I know it