Merge pull request #38 from radugrosu/patch-3

Update tidy.Rmd
This commit is contained in:
Hadley Wickham 2016-02-11 08:09:11 -06:00
commit cac428b199
1 changed files with 10 additions and 10 deletions

View File

@ -87,7 +87,7 @@ knitr::include_graphics("images/tidy-2.png")
*A data frame is a list of vectors that R displays as a table. When your data is tidy, the values of each variable fall in their own column vector.*
As a result, you can extract the all of the values of a variable in a tidy data set by extracting the column vector that contains the variable. You can do this easily with R's list syntax, i.e.
As a result, you can extract all the values of a variable in a tidy data set by extracting the column vector that contains the variable. You can do this easily with R's list syntax, i.e.
```{r}
table1$cases
@ -247,7 +247,7 @@ Every cell in a table of data contains one half of a key value pair, as does eve
table2
```
In `table2`, the `key` column contains only keys (and not just because the column is labelled `key`). Conveniently, the `value` column contains the values associated with those keys.
In `table2`, the `key` column contains only keys (and not just because the column is labeled `key`). Conveniently, the `value` column contains the values associated with those keys.
You can use the `spread()` function to tidy this layout.
@ -269,7 +269,7 @@ knitr::include_graphics("images/tidy-8.png")
*`spread()` distributes a pair of key:value columns into a field of cells. The unique keys in the key column become the column names of the field of cells.*
You can see that `spread()` maintains each of the relationships expressed in the original data set. The output contains the four original variables, *country*, *year*, *population*, and *cases*, and the values of these variables are grouped according to the orginal observations. As a bonus, now the layout of these relationships is tidy.
You can see that `spread()` maintains each of the relationships expressed in the original data set. The output contains the four original variables, *country*, *year*, *population*, and *cases*, and the values of these variables are grouped according to the original observations. As a bonus, now the layout of these relationships is tidy.
`spread()` takes three optional arguments in addition to `data`, `key`, and `value`:
@ -367,7 +367,7 @@ You can also pass an integer or vector of integers to `sep`. `separate()` will i
separate(table3, year, into = c("century", "year"), sep = 2)
```
You can futher customize `separate()` with the `remove`, `convert`, and `extra` arguments:
You can further customize `separate()` with the `remove`, `convert`, and `extra` arguments:
- **`remove`** - Set `remove = FALSE` to retain the column of values that were separated in the final data frame.
- **`convert`** - By default, `separate()` will return new columns as character columns. Set `convert = TRUE` to convert new columns to double (numeric), integer, logical, complex, and factor columns with `type.convert()`.