Add exercise on group_by (#1203)

* Add exercise on group_by

* Don't eval the code chunks

* Edits + indentation
This commit is contained in:
Mine Cetinkaya-Rundel 2023-01-02 20:44:46 -05:00 committed by GitHub
parent 29c8822d3b
commit 26a20c586a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 82 additions and 0 deletions

View File

@ -582,6 +582,88 @@ As you can see, when you summarize an ungrouped data frame, you get a single row
5. Explain what `count()` does in terms of the dplyr verbs you just learn.
What does the `sort` argument to `count()` do?
6. Suppose we have the following tiny data frame:
```{r}
df <- tibble(
x = 1:5,
y = c("a", "b", "a", "a", "b"),
z = c("K", "K", "L", "L", "K")
)
```
a. What does the following code do?
Run it, analyze the result, and describe what `group_by()` does.
```{r}
#| eval: false
df |>
group_by(y)
```
b. What does the following code do?
Run it, analyze the result, and describe what `arrange()` does.
Also comment on how it's different from the `group_by()` in part (a)?
```{r}
#| eval: false
df |>
arrange(y)
```
c. What does the following code do?
Run it, analyze the result, and describe what the pipeline does.
```{r}
#| eval: false
df |>
group_by(y) |>
summarize(mean_x = mean(x))
```
d. What does the following code do?
Run it, analyze the result, and describe what the pipeline does.
Then, comment on what the message says.
```{r}
#| eval: false
df |>
group_by(y, z) |>
summarize(mean_x = mean(x))
```
e. What does the following code do?
Run it, analyze the result, and describe what the pipeline does.
How is the output different from the one in part (d).
```{r}
#| eval: false
df |>
group_by(y, z) |>
summarize(mean_x = mean(x), .groups = "drop")
```
f. What do the following pipelines do?
Run both, analyze the results, and describe what each pipeline does.
How are the outputs of the two pipelines different?
```{r}
#| eval: false
df |>
group_by(y, z) |>
summarize(mean_x = mean(x))
df |>
group_by(y, z) |>
mutate(mean_x = mean(x))
```
## Case study: aggregates and sample size {#sec-sample-size}
Whenever you do any aggregation, it's always a good idea to include a count (`n()`).