Add exercise on group_by (#1203)

* Add exercise on group_by * Don't eval the code chunks * Edits + indentation
2023-01-02 20:44:46 -05:00 · 2023-01-02 20:44:46 -05:00 · 26a20c586a
parent 29c8822d3b
commit 26a20c586a
1 changed files with 82 additions and 0 deletions
--- a/data-transform.qmd
+++ b/data-transform.qmd
@ -582,6 +582,88 @@ As you can see, when you summarize an ungrouped data frame, you get a single row
 5.  Explain what `count()` does in terms of the dplyr verbs you just learn.
    What does the `sort` argument to `count()` do?

+6.  Suppose we have the following tiny data frame:
+
+    ```{r}
+    df <- tibble(
+      x = 1:5,
+      y = c("a", "b", "a", "a", "b"),
+      z = c("K", "K", "L", "L", "K")
+    )
+    ```
+
+    a.  What does the following code do?
+        Run it, analyze the result, and describe what `group_by()` does.
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y)
+        ```
+
+    b.  What does the following code do?
+        Run it, analyze the result, and describe what `arrange()` does.
+        Also comment on how it's different from the `group_by()` in part (a)?
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          arrange(y)
+        ```
+
+    c.  What does the following code do?
+        Run it, analyze the result, and describe what the pipeline does.
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y) |>
+          summarize(mean_x = mean(x))
+        ```
+
+    d.  What does the following code do?
+        Run it, analyze the result, and describe what the pipeline does.
+        Then, comment on what the message says.
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y, z) |>
+          summarize(mean_x = mean(x))
+        ```
+
+    e.  What does the following code do?
+        Run it, analyze the result, and describe what the pipeline does.
+        How is the output different from the one in part (d).
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y, z) |>
+          summarize(mean_x = mean(x), .groups = "drop")
+        ```
+
+    f.  What do the following pipelines do?
+        Run both, analyze the results, and describe what each pipeline does.
+        How are the outputs of the two pipelines different?
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y, z) |>
+          summarize(mean_x = mean(x))
+            
+        df |>
+          group_by(y, z) |>
+          mutate(mean_x = mean(x))
+        ```
+
 ## Case study: aggregates and sample size {#sec-sample-size}

 Whenever you do any aggregation, it's always a good idea to include a count (`n()`).