Merge branch 'master' of github.com:hadley/r4ds

# Conflicts: # EDA.Rmd
2016-07-31 11:41:30 -05:00
parent bd5a7a1c3a 3eb371e111
commit c4168bbd37
3 changed files with 41 additions and 37 deletions
--- a/transform.Rmd
+++ b/transform.Rmd
@@ -24,7 +24,7 @@ To explore the basic data manipulation verbs of dplyr, we'll use `nycflights13::
 flights
 ```

-You might notice that this data frame prints little differently to other data frames you might have used in the past: it only shows the first few rows and all the columns that fit on one screen. (To see the whole dataset, you can run `View(flights)` which will open the dataset in the RStudio viewer). It prints differently because it's a __tibble__. Tibbles are data frames, but slightly tweaked to work better in the tidyverse. For now, you don't need to worry about the differences; we'll come back to tibbles in more detail in [wrangle](#wrangle-intro).
+You might notice that this data frame prints a little differently from other data frames you might have used in the past: it only shows the first few rows and all the columns that fit on one screen. (To see the whole dataset, you can run `View(flights)` which will open the dataset in the RStudio viewer). It prints differently because it's a __tibble__. Tibbles are data frames, but slightly tweaked to work better in the tidyverse. For now, you don't need to worry about the differences; we'll come back to tibbles in more detail in [wrangle](#wrangle-intro).
 
 You might also have noticed the row of three letter abbreviations under the column names. These describe the type of each variable:

@@ -420,7 +420,7 @@ There are many functions for creating new variables that you can use with `mutat
    (e.g. 1st, 2nd, 2nd, 4th). The default gives smallest values the small
    ranks; use `desc(x)` to give the largest values the smallest ranks. 
    If `min_rank()` doesn't do what you need, look at the variants 
-    `row_number()`, `dense_rank()`, `cume_dist()`, `percent_rank()`, 
+    `row_number()`, `dense_rank()`, `percent_rank()`, `cume_dist()`,  
    `ntile()`.
    
    ```{r}
@@ -475,7 +475,7 @@ The last key verb is `summarise()`. It collapses a data frame to a single row:
 summarise(flights, delay = mean(dep_delay, na.rm = TRUE))
 ```

-(we'll come back to what that `na.rm = TRUE` means very shortly.)
+(We'll come back to what that `na.rm = TRUE` means very shortly.)

 `summarise()` is not terribly useful unless we pair it with `group_by()`. This changes the unit of analysis from the complete dataset to individual groups. Then, when you use the dplyr verbs on a grouped data frame they'll be automatically applied "by group". For example, if we applied exactly the same code to a data frame grouped by date, we get the average delay per date: