Update post map_df

2015-11-11 11:31:15 -06:00 · 2015-11-11 11:31:15 -06:00 · eaf25f50c6
parent 1aea6351a9
commit eaf25f50c6
1 changed files with 6 additions and 9 deletions
--- a/lists.Rmd
+++ b/lists.Rmd
@ -165,6 +165,7 @@ This is such a common use of for loops, that the purrr package has five function
 * `map_int()`: integer vector
 * `map_dbl()`: double vector
 * `map_chr()`: character vector
+* `map_df()`:  a data frame

 Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping:

@ -185,7 +186,6 @@ Other outputs:
 * `flatten()`
 * `map_int()` vs. `map()` + `flatten_int()`
 * `flatmap()`
-* `dplyr::bind_rows()`

 Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and 

@ -268,7 +268,6 @@ issues %>% map_chr(c("user", "login"))
 issues %>% map_int(c("user", "id"))
 ```

-
 ### Predicate functions

 Imagine we want to summarise each numeric column of a data frame. We could write this:
@ -340,14 +339,13 @@ x[error]
 y[!error] %>% map("result")
 ```

-Challenge: read_csv all the files in this directory. Which ones failed
-and why? Potentially helpful digression into names() and bind_rows(id
-= "xyz"):
+Challenge: read all the csv files in this directory. Which ones failed
+and why? 

 ```{r, eval = FALSE}
 files <- dir("data", pattern = "\\.csv$")
 files %>%
-  set_names(basename(.)) %>%
+  set_names(., basename(.)) %>%
  map_df(readr::read_csv, .id = "filename") %>%
 ```

@ -443,12 +441,12 @@ Then fit the models to each training dataset:
 mod <- trn %>% map(~lm(mpg ~ wt, data = .))
 ```

-If we wanted, we could extract the coefficients using broom, and make a single data frame with `bind_rows()` and then visualise the distributions with ggplot2:
+If we wanted, we could extract the coefficients using broom, and make a single data frame with `map_df()` and then visualise the distributions with ggplot2:

 ```{r}
 coef <- mod %>% 
  map(broom::tidy) %>% 
-  dplyr::bind_rows(.id = "i")
+  map_df(.id = "i")
 coef

 library(ggplot2)
@ -483,7 +481,6 @@ Why you should store related vectors (even if they're lists!) in a
 data frame. Need example that has some covariates so you can (e.g.)
 select all models for females, or under 30s, ...

-
 ## "Tidying" lists

 I don't know know how to put this stuff in words yet, but I know it