parent
810b9f6a3c
commit
424665c929
|
@ -176,9 +176,11 @@ billboard
|
|||
|
||||
In this dataset, each observation is a song.
|
||||
The first three columns (`artist`, `track` and `date.entered`) are variables that describe the song.
|
||||
Then we have 76 columns (`wk1`-`wk76`) that describe the rank of the song in each week.
|
||||
Then we have 76 columns (`wk1`-`wk76`) that describe the rank of the song in each week[^data-tidy-1].
|
||||
Here, the column names are one variable (the `week`) and the cell values are another (the `rank`).
|
||||
|
||||
[^data-tidy-1]: The song will be included as long as it was in the top 100 at some point in 2000, and is tracked for up to 72 weeks after it appears.
|
||||
|
||||
To tidy this data, we'll use `pivot_longer()`:
|
||||
|
||||
```{r, R.options=list(pillar.print_min = 10)}
|
||||
|
@ -202,9 +204,9 @@ Now let's turn our attention to the resulting, longer data frame.
|
|||
What happens if a song is in the top 100 for less than 76 weeks?
|
||||
Take 2 Pac's "Baby Don't Cry", for example.
|
||||
The above output suggests that it was only in the top 100 for 7 weeks, and all the remaining weeks are filled in with missing values.
|
||||
These `NA`s don't really represent unknown observations; they were forced to exist by the structure of the dataset[^data-tidy-1], so we can ask `pivot_longer()` to get rid of them by setting `values_drop_na = TRUE`:
|
||||
These `NA`s don't really represent unknown observations; they were forced to exist by the structure of the dataset[^data-tidy-2], so we can ask `pivot_longer()` to get rid of them by setting `values_drop_na = TRUE`:
|
||||
|
||||
[^data-tidy-1]: We'll come back to this idea in @sec-missing-values.
|
||||
[^data-tidy-2]: We'll come back to this idea in @sec-missing-values.
|
||||
|
||||
```{r}
|
||||
billboard |>
|
||||
|
@ -216,7 +218,7 @@ billboard |>
|
|||
)
|
||||
```
|
||||
|
||||
The number of rows is now much lower, indicating that the rows with `NA`s were dropped.
|
||||
The number of rows is now much lower, indicating that many rows with `NA`s were dropped.
|
||||
|
||||
You might also wonder what happens if a song is in the top 100 for more than 76 weeks?
|
||||
We can't tell from this data, but you might guess that additional columns `wk77`, `wk78`, ... would be added to the dataset.
|
||||
|
|
Loading…
Reference in New Issue