Add some explanatory text + diagrams

This commit is contained in:
Hadley Wickham 2022-03-03 10:14:00 -06:00
parent ab41435eae
commit 7a74e6a1db
5 changed files with 63 additions and 0 deletions

View File

@ -228,6 +228,69 @@ billboard_tidy |>
scale_y_reverse()
```
### How does pivoting work?
Now that you've seen what pivoting can do for you, it's worth taking a little time to gain some intuition for what's happening to the data.
Let's make a very simple dataset to make it easier to see what's happening:
```{r}
df <- tribble(
~var, ~col1, ~col2,
"A", 1, 2,
"B", 3, 4,
"C", 5, 6
)
```
Here we'll say there are three variables `var` (already in a variable), `name` (the column names in the column names), and `value` (the cell values).
So we can tidy it with:
```{r}
df |>
pivot_longer(
cols = col1:col2,
names_to = "names",
values_to = "values"
)
```
How does this transformation take place?
It's easier to see if we take it component by component.
Columns that are already variables need to be repeated, once for each column in `cols`, as shown in Figure \@ref(fig:pivot-variables).
```{r pivot-variables}
#| echo: false
#| out.width: ~
#| fig.cap: >
#| Columns that are already variables need to be repeated, once for
#| each column that is pivotted.
knitr::include_graphics("diagrams/tidy-data/variables.png", dpi = 144)
```
The column names become values in a new variable, whose name is given by `names_to`, as shown in Figure \@ref(fig:pivot-names).
They need to be repeated for each row in the original dataset.
```{r pivot-names}
#| echo: false
#| out.width: ~
#| fig.cap: >
#| The column names of pivoted columns become a new column.
knitr::include_graphics("diagrams/tidy-data/column-names.png", dpi = 144)
```
The cell values also become values in a new variable, with name given by `values_to`.
The are unwound row by row.
Figure \@ref(fig:pivot-values) illustrates the process.
```{r pivot-values}
#| echo: false
#| out.width: ~
#| fig.cap: >
#| The number of values are preserved (not repeated), but unwound
#| row-by-row.
knitr::include_graphics("diagrams/tidy-data/cell-values.png", dpi = 144)
```
### Many variables in column names
A more challenging situation occurs when you have multiple variables crammed into the column names.

BIN
diagrams/tidy-data.graffle Normal file

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB