Spell check suggestions (#259)
This commit is contained in:
parent
f9901e3e54
commit
2c0c6a8be5
14
tidy.Rmd
14
tidy.Rmd
|
@ -119,7 +119,7 @@ The second step is to resolve one of two common problems:
|
|||
|
||||
1. One variable might be spread across multiple columns.
|
||||
|
||||
1. One observation might be scattered across mutliple rows.
|
||||
1. One observation might be scattered across multiple rows.
|
||||
|
||||
Typically a dataset will only suffer from one of these problems; it'll only suffer from both if you're really unlucky! To fix these problems, you'll need the two most important functions in tidyr: `gather()` and `spread()`.
|
||||
|
||||
|
@ -185,10 +185,10 @@ To tidy this up, we first analyse the representation in similar way to `gather()
|
|||
* The column that contains variable names, the `key` column. Here, it's
|
||||
`type`.
|
||||
|
||||
* The column that contains values froms multiple variables, the `value`
|
||||
* The column that contains values forms multiple variables, the `value`
|
||||
column. Here it's `count`.
|
||||
|
||||
Once we've figured that out, we can use `spread()`, as shown progammatically below, and visually in Figure \@ref(fig:tidy-spread).
|
||||
Once we've figured that out, we can use `spread()`, as shown programmatically below, and visually in Figure \@ref(fig:tidy-spread).
|
||||
|
||||
```{r}
|
||||
spread(table2, key = type, value = count)
|
||||
|
@ -317,7 +317,7 @@ table5 %>%
|
|||
unite(new, century, year)
|
||||
```
|
||||
|
||||
In this case we also need to use the `sep` arguent. The default will place an underscore (`_`) between the values from different columns. Here we don't want any separator so we use `""`:
|
||||
In this case we also need to use the `sep` argument. The default will place an underscore (`_`) between the values from different columns. Here we don't want any separator so we use `""`:
|
||||
|
||||
```{r}
|
||||
table5 %>%
|
||||
|
@ -345,7 +345,7 @@ table5 %>%
|
|||
|
||||
## Missing values
|
||||
|
||||
Changing the representation of a dataset brings up an important subtlety of missing values. Suprisingly, a value can be missing in one of two possible ways:
|
||||
Changing the representation of a dataset brings up an important subtlety of missing values. Surprisingly, a value can be missing in one of two possible ways:
|
||||
|
||||
* __Explicitly__, i.e. flagged with `NA`.
|
||||
* __Implicitly__, i.e. simply not present in the data.
|
||||
|
@ -442,7 +442,7 @@ The best place to start is almost always to gathering together the columns that
|
|||
in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`)
|
||||
these are likely to be values, not variables.
|
||||
|
||||
So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells repesent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
|
||||
So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
|
||||
|
||||
```{r}
|
||||
who1 <- who %>%
|
||||
|
@ -550,7 +550,7 @@ who %>%
|
|||
|
||||
## Non-tidy data
|
||||
|
||||
Before we continue on to other topics, it's worth talking briefly about non-tidy data. Earlier in the chapter, I used the perjorative term "messy" to refer to non-tidy data. That's an oversimplification: there are lots of useful and well founded data structures that are not tidy data. There are two mains reasons to use other data structures:
|
||||
Before we continue on to other topics, it's worth talking briefly about non-tidy data. Earlier in the chapter, I used the pejorative term "messy" to refer to non-tidy data. That's an oversimplification: there are lots of useful and well founded data structures that are not tidy data. There are two mains reasons to use other data structures:
|
||||
|
||||
* Alternative representations may have substantial performance or space
|
||||
advantages.
|
||||
|
|
Loading…
Reference in New Issue