commit
cac428b199
8
tidy.Rmd
8
tidy.Rmd
|
@ -87,7 +87,7 @@ knitr::include_graphics("images/tidy-2.png")
|
|||
|
||||
*A data frame is a list of vectors that R displays as a table. When your data is tidy, the values of each variable fall in their own column vector.*
|
||||
|
||||
As a result, you can extract the all of the values of a variable in a tidy data set by extracting the column vector that contains the variable. You can do this easily with R's list syntax, i.e.
|
||||
As a result, you can extract all the values of a variable in a tidy data set by extracting the column vector that contains the variable. You can do this easily with R's list syntax, i.e.
|
||||
|
||||
```{r}
|
||||
table1$cases
|
||||
|
@ -247,7 +247,7 @@ Every cell in a table of data contains one half of a key value pair, as does eve
|
|||
table2
|
||||
```
|
||||
|
||||
In `table2`, the `key` column contains only keys (and not just because the column is labelled `key`). Conveniently, the `value` column contains the values associated with those keys.
|
||||
In `table2`, the `key` column contains only keys (and not just because the column is labeled `key`). Conveniently, the `value` column contains the values associated with those keys.
|
||||
|
||||
You can use the `spread()` function to tidy this layout.
|
||||
|
||||
|
@ -269,7 +269,7 @@ knitr::include_graphics("images/tidy-8.png")
|
|||
|
||||
*`spread()` distributes a pair of key:value columns into a field of cells. The unique keys in the key column become the column names of the field of cells.*
|
||||
|
||||
You can see that `spread()` maintains each of the relationships expressed in the original data set. The output contains the four original variables, *country*, *year*, *population*, and *cases*, and the values of these variables are grouped according to the orginal observations. As a bonus, now the layout of these relationships is tidy.
|
||||
You can see that `spread()` maintains each of the relationships expressed in the original data set. The output contains the four original variables, *country*, *year*, *population*, and *cases*, and the values of these variables are grouped according to the original observations. As a bonus, now the layout of these relationships is tidy.
|
||||
|
||||
`spread()` takes three optional arguments in addition to `data`, `key`, and `value`:
|
||||
|
||||
|
@ -367,7 +367,7 @@ You can also pass an integer or vector of integers to `sep`. `separate()` will i
|
|||
separate(table3, year, into = c("century", "year"), sep = 2)
|
||||
```
|
||||
|
||||
You can futher customize `separate()` with the `remove`, `convert`, and `extra` arguments:
|
||||
You can further customize `separate()` with the `remove`, `convert`, and `extra` arguments:
|
||||
|
||||
- **`remove`** - Set `remove = FALSE` to retain the column of values that were separated in the final data frame.
|
||||
- **`convert`** - By default, `separate()` will return new columns as character columns. Set `convert = TRUE` to convert new columns to double (numeric), integer, logical, complex, and factor columns with `type.convert()`.
|
||||
|
|
Loading…
Reference in New Issue