Update post map_df
This commit is contained in:
		
							
								
								
									
										15
									
								
								lists.Rmd
									
									
									
									
									
								
							
							
						
						
									
										15
									
								
								lists.Rmd
									
									
									
									
									
								
							@@ -165,6 +165,7 @@ This is such a common use of for loops, that the purrr package has five function
 | 
			
		||||
* `map_int()`: integer vector
 | 
			
		||||
* `map_dbl()`: double vector
 | 
			
		||||
* `map_chr()`: character vector
 | 
			
		||||
* `map_df()`:  a data frame
 | 
			
		||||
 | 
			
		||||
Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping:
 | 
			
		||||
 | 
			
		||||
@@ -185,7 +186,6 @@ Other outputs:
 | 
			
		||||
* `flatten()`
 | 
			
		||||
* `map_int()` vs. `map()` + `flatten_int()`
 | 
			
		||||
* `flatmap()`
 | 
			
		||||
* `dplyr::bind_rows()`
 | 
			
		||||
 | 
			
		||||
Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and 
 | 
			
		||||
 | 
			
		||||
@@ -268,7 +268,6 @@ issues %>% map_chr(c("user", "login"))
 | 
			
		||||
issues %>% map_int(c("user", "id"))
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Predicate functions
 | 
			
		||||
 | 
			
		||||
Imagine we want to summarise each numeric column of a data frame. We could write this:
 | 
			
		||||
@@ -340,14 +339,13 @@ x[error]
 | 
			
		||||
y[!error] %>% map("result")
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Challenge: read_csv all the files in this directory. Which ones failed
 | 
			
		||||
and why? Potentially helpful digression into names() and bind_rows(id
 | 
			
		||||
= "xyz"):
 | 
			
		||||
Challenge: read all the csv files in this directory. Which ones failed
 | 
			
		||||
and why? 
 | 
			
		||||
 | 
			
		||||
```{r, eval = FALSE}
 | 
			
		||||
files <- dir("data", pattern = "\\.csv$")
 | 
			
		||||
files %>%
 | 
			
		||||
  set_names(basename(.)) %>%
 | 
			
		||||
  set_names(., basename(.)) %>%
 | 
			
		||||
  map_df(readr::read_csv, .id = "filename") %>%
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -443,12 +441,12 @@ Then fit the models to each training dataset:
 | 
			
		||||
mod <- trn %>% map(~lm(mpg ~ wt, data = .))
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
If we wanted, we could extract the coefficients using broom, and make a single data frame with `bind_rows()` and then visualise the distributions with ggplot2:
 | 
			
		||||
If we wanted, we could extract the coefficients using broom, and make a single data frame with `map_df()` and then visualise the distributions with ggplot2:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
coef <- mod %>% 
 | 
			
		||||
  map(broom::tidy) %>% 
 | 
			
		||||
  dplyr::bind_rows(.id = "i")
 | 
			
		||||
  map_df(.id = "i")
 | 
			
		||||
coef
 | 
			
		||||
 | 
			
		||||
library(ggplot2)
 | 
			
		||||
@@ -483,7 +481,6 @@ Why you should store related vectors (even if they're lists!) in a
 | 
			
		||||
data frame. Need example that has some covariates so you can (e.g.)
 | 
			
		||||
select all models for females, or under 30s, ...
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## "Tidying" lists
 | 
			
		||||
 | 
			
		||||
I don't know know how to put this stuff in words yet, but I know it
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user