parent
1729264d91
commit
daaa861f74
|
@ -548,9 +548,11 @@ There are a few other general strategies to help you parse files:
|
|||
frame.
|
||||
|
||||
```{r}
|
||||
df <- tibble(
|
||||
x = c("1", "2", "3"),
|
||||
y = c("1.21", "2.32", "4.56")
|
||||
df <- tribble(
|
||||
~x, ~y,
|
||||
"1", "1.21",
|
||||
"2", "2.32",
|
||||
"3", "4.56"
|
||||
)
|
||||
df
|
||||
|
||||
|
|
|
@ -542,7 +542,12 @@ You can also perform transformations inside the model formula. For example, `log
|
|||
Again, if you get confused about what your model is doing, you can always use `model_matrix()` to see exactly what equation `lm()` is fitting:
|
||||
|
||||
```{r}
|
||||
df <- tibble(y = 1:3, x = 1:3)
|
||||
df <- tribble(
|
||||
~y, ~x,
|
||||
1, 1,
|
||||
2, 2,
|
||||
3, 3
|
||||
)
|
||||
model_matrix(df, y ~ x^2 + x)
|
||||
model_matrix(df, y ~ I(x^2) + x)
|
||||
```
|
||||
|
|
|
@ -295,6 +295,16 @@ tibble(
|
|||
)
|
||||
```
|
||||
|
||||
It's even easier with `tribble()` as it can automatically work out that you need a list:
|
||||
|
||||
```{r}
|
||||
tribble(
|
||||
~x, ~y,
|
||||
1:3, "1, 2",
|
||||
3:5, "3, 4, 5"
|
||||
)
|
||||
```
|
||||
|
||||
List-columns are often most useful as intermediate data structure. They're hard to work with directly, because most R functions work with atomic vectors or data frames, but the advantage of keeping related items together in a data frame is worth a little hassle.
|
||||
|
||||
Generally there are three parts of an effective list-column pipeline:
|
||||
|
@ -349,7 +359,11 @@ gapminder %>%
|
|||
Some useful functions take an atomic vector and return a list. For example, in [strings] you learned about `stringr::str_split()` which takes a character vector and returns a list of character vectors. If you use that inside mutate, you'll get a list-column:
|
||||
|
||||
```{r}
|
||||
df <- tibble(x1 = c("a,b,c", "d,e,f,g"))
|
||||
df <- tribble(
|
||||
~x1,
|
||||
"a,b,c",
|
||||
"d,e,f,g"
|
||||
)
|
||||
|
||||
df %>%
|
||||
mutate(x2 = stringr::str_split(x1, ","))
|
||||
|
@ -478,12 +492,11 @@ These are described in more detail below.
|
|||
If you can reduce your list column to an atomic vector then it will be a regular column. For example, you can always summarise an object with it's type and length, so this code will work regardless of what sort of list-column you have:
|
||||
|
||||
```{r}
|
||||
df <- tibble(
|
||||
x = list(
|
||||
letters[1:5],
|
||||
1:3,
|
||||
runif(5)
|
||||
)
|
||||
df <- tribble(
|
||||
~x,
|
||||
letters[1:5],
|
||||
1:3,
|
||||
runif(5)
|
||||
)
|
||||
|
||||
df %>% mutate(
|
||||
|
@ -497,11 +510,10 @@ This is the same basic information that you get from the default tbl print metho
|
|||
Don't forget about the `map_*()` shortcuts - you can use `map_chr(x, "apple")` to extract the string stored in `apple` for each element of `x`. This is useful for pulling apart nested lists into regular columns. Use the `.null` argument to provide a value to use if the element is missing (instead of returning `NULL`):
|
||||
|
||||
```{r}
|
||||
df <- tibble(
|
||||
x = list(
|
||||
list(a = 1, b = 2),
|
||||
list(a = 2, c = 4)
|
||||
)
|
||||
df <- tribble(
|
||||
~x,
|
||||
list(a = 1, b = 2),
|
||||
list(a = 2, c = 4)
|
||||
)
|
||||
df %>% mutate(
|
||||
a = map_dbl(x, "a"),
|
||||
|
@ -522,19 +534,19 @@ This means that you can't simultaneously unnest two columns that contain differe
|
|||
```{r, error = TRUE}
|
||||
# Ok, because y and z have the same number of elements in
|
||||
# every row
|
||||
df1 <- tibble(
|
||||
x = 1:2,
|
||||
y = list(c("a", "b"), "c"),
|
||||
z = list(1:2, 3)
|
||||
df1 <- tribble(
|
||||
~x, ~y, ~z,
|
||||
1, c("a", "b"), 1:2,
|
||||
2, "c", 3
|
||||
)
|
||||
df1
|
||||
df1 %>% unnest(y, z)
|
||||
|
||||
# Doesn't work because y and z have different number of elements
|
||||
df2 <- tibble(
|
||||
x = 1:2,
|
||||
y = list("a", c("b", "c")),
|
||||
z = list(1:2, 3)
|
||||
df2 <- tribble(
|
||||
~x, ~y, ~z,
|
||||
1, "a", 1:2,
|
||||
2, c("b", "c"), 3
|
||||
)
|
||||
df2
|
||||
df2 %>% unnest(y, z)
|
||||
|
|
|
@ -206,8 +206,18 @@ To help you learn how joins work, I'm going to use a visual representation:
|
|||
knitr::include_graphics("diagrams/join-setup.png")
|
||||
```
|
||||
```{r}
|
||||
(x <- tibble(key = c(1, 2, 3), val_x = c("x1", "x2", "x3")))
|
||||
(y <- tibble(key = c(1, 2, 4), val_y = c("y1", "y2", "y3")))
|
||||
x <- tribble(
|
||||
~key, ~val_x,
|
||||
1, "x1",
|
||||
2, "x2",
|
||||
3, "x3"
|
||||
)
|
||||
y <- tribble(
|
||||
~key, ~val_y,
|
||||
1, "y1",
|
||||
2, "y2",
|
||||
4, "y3"
|
||||
)
|
||||
```
|
||||
|
||||
The coloured column represents the "key" variable: these are used to match the rows between the tables. The grey column represents the "value" column that is carried along for the ride. In these examples I'll show a single key variable and single value variable, but idea generalises in a straightforward way to multiple keys and multiple values.
|
||||
|
@ -288,8 +298,18 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
|
|||
and a foreign key in `x`.
|
||||
|
||||
```{r}
|
||||
x <- tibble(key = c(1, 2, 2, 1), val_x = stringr::str_c("x", 1:4))
|
||||
y <- tibble(key = 1:2, val_y = stringr::str_c("y", 1:2))
|
||||
x <- tribble(
|
||||
~key, ~val_x,
|
||||
1, "x1",
|
||||
2, "x2",
|
||||
2, "x3",
|
||||
1, "x4"
|
||||
)
|
||||
y <- tribble(
|
||||
~key, ~val_y,
|
||||
1, "y1",
|
||||
2, "y2"
|
||||
)
|
||||
left_join(x, y, by = "key")
|
||||
```
|
||||
|
||||
|
@ -302,8 +322,20 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
|
|||
```
|
||||
|
||||
```{r}
|
||||
x <- tibble(key = c(1, 2, 2, 3), val_x = stringr::str_c("x", 1:4))
|
||||
y <- tibble(key = c(1, 2, 2, 3), val_y = stringr::str_c("y", 1:4))
|
||||
x <- tribble(
|
||||
~key, ~val_x,
|
||||
1, "x1",
|
||||
2, "x2",
|
||||
2, "x3",
|
||||
3, "x4"
|
||||
)
|
||||
y <- tribble(
|
||||
~key, ~val_y,
|
||||
1, "y1",
|
||||
2, "y2",
|
||||
2, "y3",
|
||||
3, "y4"
|
||||
)
|
||||
left_join(x, y, by = "key")
|
||||
```
|
||||
|
||||
|
@ -543,8 +575,16 @@ The final type of two-table verb are the set operations. Generally, I use these
|
|||
Given this simple data:
|
||||
|
||||
```{r}
|
||||
(df1 <- tibble(x = 1:2, y = c(1L, 1L)))
|
||||
(df2 <- tibble(x = 1:2, y = 1:2))
|
||||
df1 <- tribble(
|
||||
~x, ~y,
|
||||
1, 1,
|
||||
2, 1
|
||||
)
|
||||
df2 <- tribble(
|
||||
~x, ~y,
|
||||
1, 1,
|
||||
1, 2
|
||||
)
|
||||
```
|
||||
|
||||
The four possibilities are:
|
||||
|
|
|
@ -499,11 +499,12 @@ This works because every geom has a default stat; and every stat has a default g
|
|||
is generated by counting rows.
|
||||
|
||||
```{r, warning = FALSE}
|
||||
demo <- tibble(
|
||||
a = c("bar_1", "bar_2", "bar_3"),
|
||||
b = c(20, 30, 40)
|
||||
demo <- tribble(
|
||||
~a, ~b,
|
||||
"bar_1", 20,
|
||||
"bar_2", 30,
|
||||
"bar_3", 40
|
||||
)
|
||||
demo
|
||||
|
||||
ggplot(data = demo) +
|
||||
geom_bar(mapping = aes(x = a, y = b), stat = "identity")
|
||||
|
|
Loading…
Reference in New Issue