Merge branch 'master' of github.com:hadley/r4ds

# Conflicts:
#	EDA.Rmd
This commit is contained in:
hadley
2016-07-31 11:41:30 -05:00
3 changed files with 41 additions and 37 deletions

View File

@@ -24,7 +24,7 @@ To explore the basic data manipulation verbs of dplyr, we'll use `nycflights13::
flights
```
You might notice that this data frame prints little differently to other data frames you might have used in the past: it only shows the first few rows and all the columns that fit on one screen. (To see the whole dataset, you can run `View(flights)` which will open the dataset in the RStudio viewer). It prints differently because it's a __tibble__. Tibbles are data frames, but slightly tweaked to work better in the tidyverse. For now, you don't need to worry about the differences; we'll come back to tibbles in more detail in [wrangle](#wrangle-intro).
You might notice that this data frame prints a little differently from other data frames you might have used in the past: it only shows the first few rows and all the columns that fit on one screen. (To see the whole dataset, you can run `View(flights)` which will open the dataset in the RStudio viewer). It prints differently because it's a __tibble__. Tibbles are data frames, but slightly tweaked to work better in the tidyverse. For now, you don't need to worry about the differences; we'll come back to tibbles in more detail in [wrangle](#wrangle-intro).
You might also have noticed the row of three letter abbreviations under the column names. These describe the type of each variable:
@@ -420,7 +420,7 @@ There are many functions for creating new variables that you can use with `mutat
(e.g. 1st, 2nd, 2nd, 4th). The default gives smallest values the small
ranks; use `desc(x)` to give the largest values the smallest ranks.
If `min_rank()` doesn't do what you need, look at the variants
`row_number()`, `dense_rank()`, `cume_dist()`, `percent_rank()`,
`row_number()`, `dense_rank()`, `percent_rank()`, `cume_dist()`,
`ntile()`.
```{r}
@@ -475,7 +475,7 @@ The last key verb is `summarise()`. It collapses a data frame to a single row:
summarise(flights, delay = mean(dep_delay, na.rm = TRUE))
```
(we'll come back to what that `na.rm = TRUE` means very shortly.)
(We'll come back to what that `na.rm = TRUE` means very shortly.)
`summarise()` is not terribly useful unless we pair it with `group_by()`. This changes the unit of analysis from the complete dataset to individual groups. Then, when you use the dplyr verbs on a grouped data frame they'll be automatically applied "by group". For example, if we applied exactly the same code to a data frame grouped by date, we get the average delay per date: