Incorporating suggestions from @csgillespie

This commit is contained in:
hadley
2016-10-04 07:49:10 -05:00
parent fd9a3f57f7
commit b3855be66c
3 changed files with 72 additions and 64 deletions

View File

@@ -8,7 +8,7 @@ Visualisation is an important tool for insight generation, but it is rare that y
In this chapter we're going to focus on how to use the dplyr package, another core member of the tidyverse. We'll illustrate the key ideas using data from the nycflights13 package, and use ggplot2 to help us understand the data.
```{r setup}
```{r setup, message = FALSE}
library(nycflights13)
library(tidyverse)
```
@@ -44,7 +44,7 @@ There are three other common types of variables that aren't used in this dataset
* `date` stands for dates.
### Dplyr basics
### dplyr basics
In this chapter you are going to learn the five key dplyr functions that allow you to solve the vast majority of your data manipulation challenges:
@@ -431,7 +431,7 @@ There are many functions for creating new variables that you can use with `mutat
dense_rank(y),
percent_rank(y),
cume_dist(y)
) %>% knitr::kable()
)
```
### Exercises
@@ -594,7 +594,7 @@ delays <- not_cancelled %>%
)
ggplot(data = delays, mapping = aes(x = n, y = delay)) +
geom_point()
geom_point(alpha = 1/10)
```
Not surprisingly, there is much greater variation in the average delay when there are few flights. The shape of this plot is very characteristic: whenever you plot a mean (or other summary) vs. group size, you'll see that the variation decreases as the sample size increases.
@@ -605,7 +605,7 @@ When looking at this sort of plot, it's often useful to filter out the groups wi
delays %>%
filter(n > 25) %>%
ggplot(mapping = aes(x = n, y = delay)) +
geom_point()
geom_point(alpha = 1/10)
```
--------------------------------------------------------------------------------