Fixes for dev tidyr
This commit is contained in:
parent
5ef6a6af54
commit
40a56c55ed
49
strings.qmd
49
strings.qmd
|
@ -261,10 +261,10 @@ Working from <https://github.com/tidyverse/tidyr/pull/1304>.
|
|||
It's very common for multiple variables to be crammed together into a single string.
|
||||
In this section you'll learn how to use four tidyr to extract them:
|
||||
|
||||
- `df |> separate_by_longer(col, sep)`
|
||||
- `df |> separate_at_longer(col, width)`
|
||||
- `df |> separate_by_wider(col, sep, names)`
|
||||
- `df |> separate_at_wider(col, widths)`
|
||||
- `df |> separate_longer_delim(col, delim)`
|
||||
- `df |> separate_longer_position(col, width)`
|
||||
- `df |> separate_wider_delim(col, delim, names)`
|
||||
- `df |> separate_wider_(col, widths)`
|
||||
|
||||
If you look closely you can see there's a common pattern here: `separate` followed by `by` or `at`, followed by longer or `wider`.
|
||||
`by` splits up a string with a separator like `", "` or `" "`.
|
||||
|
@ -274,80 +274,63 @@ If you look closely you can see there's a common pattern here: `separate` follow
|
|||
There's one more member of this family, `separate_regex_wider()`, that we'll come back in @sec-regular-expressions.
|
||||
It's the most flexible of the `at` forms but you need to know a bit about regular expression in order to use it.
|
||||
|
||||
```{r}
|
||||
#| include: false
|
||||
has_dev_tidyr <- packageVersion("tidyr") >= "1.2.1.9001"
|
||||
```
|
||||
|
||||
The next two sections will give you the basic idea behind these separate functions, and then we'll work through a few case studies that require mutliple uses.
|
||||
|
||||
### Splitting into rows
|
||||
|
||||
`separate_by_longer()` and `separate_at_longer()` are most useful when the number of components varies from row to row.
|
||||
`separate_by_longer()` arises most commonly:
|
||||
`separate_longer_delim()` and `separate_longer_position()` are most useful when the number of components varies from row to row.
|
||||
`separate_longer_delim()` arises most commonly:
|
||||
|
||||
```{r}
|
||||
#| eval: !expr has_dev_tidyr
|
||||
|
||||
df1 <- tibble(x = c("a,b,c", "d,e", "f"))
|
||||
df1 |>
|
||||
separate_by_longer(x, sep = ",")
|
||||
separate_longer_delim(x, delim = ",")
|
||||
```
|
||||
|
||||
(If the separators have some variation you can use a regular expression instead, if you know about it.)
|
||||
|
||||
It's rarer to see `separate_at_longer()` in the wild, but some older datasets can adopt a very compact format where each character is used to record a value:
|
||||
It's rarer to see `separate_longer_position()` in the wild, but some older datasets can adopt a very compact format where each character is used to record a value:
|
||||
|
||||
```{r}
|
||||
#| eval: !expr has_dev_tidyr
|
||||
|
||||
df2 <- tibble(x = c("1211", "131", "21"))
|
||||
df2 |>
|
||||
separate_at_longer(x, width = 1)
|
||||
separate_longer_position(x, width = 1)
|
||||
```
|
||||
|
||||
### Splitting into columns
|
||||
|
||||
`separate_by_wider()` and `separate_at_wider()` are most useful when there are a fixed number of components in each string, and you want to spread them into columns.
|
||||
`separate_wider_delim()` and `separate_wider_position()` are most useful when there are a fixed number of components in each string, and you want to spread them into columns.
|
||||
They are more complicated that their `by` equivalents because you need to name the columns.
|
||||
|
||||
```{r}
|
||||
#| eval: !expr has_dev_tidyr
|
||||
|
||||
df3 <- tibble(x = c("a,1,2022", "b,2,2011", "e,5,2015"))
|
||||
df3 |>
|
||||
separate_by_wider(x, sep = ",", names = c("letter", "number", "year"))
|
||||
separate_wider_delim(x, delim = ",", names = c("letter", "number", "year"))
|
||||
```
|
||||
|
||||
If a specific value is not useful you can use `NA` to omit it from the results:
|
||||
|
||||
```{r}
|
||||
#| eval: !expr has_dev_tidyr
|
||||
|
||||
df3 <- tibble(x = c("a,1,2022", "b,2,2011", "e,5,2015"))
|
||||
df3 |>
|
||||
separate_by_wider(x, sep = ",", names = c("letter", NA, "year"))
|
||||
separate_wider_delim(x, delim = ",", names = c("letter", NA, "year"))
|
||||
```
|
||||
|
||||
Alternatively, you can provide `names_sep` and `separate_by_wider()` will use that separator to name automatically:
|
||||
Alternatively, you can provide `names_sep` and `separate_wider_delim()` will use that separator to name automatically:
|
||||
|
||||
```{r}
|
||||
#| eval: !expr has_dev_tidyr
|
||||
|
||||
df3 |>
|
||||
separate_by_wider(x, sep = ",", names_sep = "_")
|
||||
separate_wider_delim(x, delim = ",", names_sep = "_")
|
||||
```
|
||||
|
||||
`separate_at_wider()` works a little differently, because you typically want to specify the width of each column.
|
||||
`separate_wider_position()` works a little differently, because you typically want to specify the width of each column.
|
||||
So you give it a named integer vector, where the name gives the name of the new column and the value is the number of characters it occupies.
|
||||
You can omit values from the output by not naming them:
|
||||
|
||||
```{r}
|
||||
#| eval: !expr has_dev_tidyr
|
||||
|
||||
df4 <- tibble(x = c("202215TX", "202122LA", "202325CA"))
|
||||
df4 |>
|
||||
separate_at_wider(x, c(year = 4, age = 2, state = 2))
|
||||
separate_wider_position(x, c(year = 4, age = 2, state = 2))
|
||||
```
|
||||
|
||||
### Case studies
|
||||
|
|
Loading…
Reference in New Issue