Make table3 match tidyr version, adjust ex accordingly
This commit is contained in:
parent
d2b27da862
commit
b6277d08fc
|
@ -44,16 +44,6 @@ You can represent the same underlying data in multiple ways.
|
|||
The example below shows the same data organized in three different ways.
|
||||
Each dataset shows the same values of four variables: *country*, *year*, *population*, and number of documented *cases* of TB (tuberculosis), but each dataset organizes the values in a different way.
|
||||
|
||||
```{r}
|
||||
#| echo: false
|
||||
|
||||
table2 <- table1 |>
|
||||
pivot_longer(cases:population, names_to = "type", values_to = "count")
|
||||
|
||||
table3 <- table2 |>
|
||||
pivot_wider(names_from = year, values_from = count)
|
||||
```
|
||||
|
||||
```{r}
|
||||
table1
|
||||
|
||||
|
@ -136,7 +126,7 @@ ggplot(table1, aes(x = year, y = cases)) +
|
|||
|
||||
1. For each of the sample tables, describe what each observation and each column represents.
|
||||
|
||||
2. Sketch out the process you'd use to calculate the `rate` for `table2` and `table3`.
|
||||
2. Sketch out the process you'd use to calculate the `rate` from `table2`.
|
||||
You will need to perform four operations:
|
||||
|
||||
a. Extract the number of TB cases per country per year.
|
||||
|
@ -360,7 +350,7 @@ There are two columns that are already variables and are easy to interpret: `cou
|
|||
They are followed by 56 columns like `sp_m_014`, `ep_m_4554`, and `rel_m_3544`.
|
||||
If you stare at these columns for long enough, you'll notice there's a pattern.
|
||||
Each column name is made up of three pieces separated by `_`.
|
||||
The first piece, `sp`/`rel`/`ep`, describes the method used for the diagnosis, the second piece, `m`/`f` is the `gender` (coded as a binary variable in this dataset), and the third piece, `014`/`1524`/`2534`/`3544`/`4554`/`5564/``65` is the `age` range (`014` represents 0-14, for example).
|
||||
The first piece, `sp`/`rel`/`ep`, describes the method used for the diagnosis, the second piece, `m`/`f` is the `gender` (coded as a binary variable in this dataset), and the third piece, `014`/`1524`/`2534`/`3544`/`4554`/``` 5564/``65 ``` is the `age` range (`014` represents 0-14, for example).
|
||||
|
||||
So in this case we have six pieces of information recorded in `who2`: the country and the year (already columns); the method of diagnosis, the gender category, and the age range category (contained in the other column names); and the count of patients in that category (cell values).
|
||||
To organize these six pieces of information in six separate columns, we use `pivot_longer()` with a vector of column names for `names_to` and instructors for splitting the original variable names into pieces for `names_sep` as well as a column name for `values_to`:
|
||||
|
|
Loading…
Reference in New Issue