Move important tibble content earlier in the book (#1110)

Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
2022-10-24 13:16:14 -05:00
parent c0461b11bd
commit f93a5daeeb
3 changed files with 83 additions and 97 deletions
--- a/data-import.qmd
+++ b/data-import.qmd
@@ -201,6 +201,29 @@ There are a few good reasons to favor readr functions over the base equivalents:
 -   They are more reproducible.
    Base R functions inherit some behavior from your operating system and environment variables, so import code that works on your computer might not work on someone else's.

+### Non-syntactic names
+
+It's possible for a CSV file to have column names that are not valid R variable names, we refer to these as **non-syntactic** names.
+For example, the variables might not start with a letter or they might contain unusual characters like a space:
+
+```{r}
+df <- read_csv("data/non-syntactic.csv", col_types = list())
+df
+```
+
+You'll notice that they print surrounded by backticks, which you'll need to use when referring to them in other functions:
+
+```{r}
+df |> relocate(`2000`, .after = `:)`)
+```
+
+These values only need special handling when they appear in column names.
+If you turn them into data (e.g. with `pivot_longer()`) they are just regular strings:
+
+```{r}
+df |> pivot_longer(everything())
+```
+
 ### Exercises

 1.  What function would you use to read a file where fields were separated with "\|"?
@@ -232,6 +255,20 @@ There are a few good reasons to favor readr functions over the base equivalents:
    read_csv("a;b\n1;3")
    ```

+6.  Practice referring to non-syntactic names in the following data frame by:
+
+    a.  Extracting the variable called `1`.
+    b.  Plotting a scatterplot of `1` vs `2`.
+    c.  Creating a new column called `3` which is `2` divided by `1`.
+    d.  Renaming the columns to `one`, `two` and `three`.
+
+    ```{r}
+    annoying <- tibble(
+      `1` = 1:10,
+      `2` = `1` * 2 + rnorm(length(`1`))
+    )
+    ```
+
 ## Reading data from multiple files {#sec-readr-directory}

 Sometimes your data is split across multiple files instead of being contained in a single file.
@@ -326,9 +363,50 @@ file.remove("students-2.csv")
 file.remove("students.rds")
 ```

+## Data entry
+
+Sometimes you'll need to assemble a tibble "by hand" doing a little data entry in your R script.
+There are two useful functions to help you do this which differ in whether you layout the tibble by columns or by rows.
+`tibble()` works by column:
+
+```{r}
+tibble(
+  x = c(1, 2, 5), 
+  y = c("h", "m", "g"),
+  z = c(0.08, 0.83, 0.60)
+)
+```
+
+Note that every column in tibble must be same size, so you'll get an error if they're not:
+
+```{r}
+#| error: true
+
+tibble(
+  x = c(1, 2),
+  y = c("h", "m", "g"),
+  z = c(0.08, 0.83, 0.6)
+)
+```
+
+Laying out the data by column can make it hard to see how the rows are related, so an alternative is `tribble()`, short for **tr**ansposed t**ibble**, which lets you lay out your data row by row.
+`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
+This makes it possible to lay out small amounts of data in an easy to read form:
+
+```{r}
+tribble(
+  ~x, ~y, ~z,
+  "h", 1, 0.08,
+  "m", 2, 0.83,
+  "g", 5, 0.60,
+)
+```
+
+We'll use `tibble()` and `tribble()` later in the book to construct small examples to demonstrate how various functions work.
+
 ## Summary

-In this chapter, you've learned how to use readr to load rectangular flat files from disk into R.
+In this chapter, you've learned how to load CSV files with `read_csv()` and to do your own data entry with `tibble()` and `tribble()`.
 You've learned how csv files work, some of the problems you might encounter, and how to overcome them.
 We'll come to data import a few times in this book: @sec-import-databases will show you how to load data from databases, @sec-import-spreadsheets from Excel and googlesheets, @sec-rectangling from JSON, and @sec-scraping from websites.

--- a/data/non-syntactic.csv
+++ b/data/non-syntactic.csv
@@ -0,0 +1,2 @@
+:),x y,2000
+smile,space,number
--- a/tibble.qmd
+++ b/tibble.qmd
@@ -27,86 +27,6 @@ In this chapter we'll explore the **tibble** package, part of the core tidyverse
 library(tidyverse)
 ```

-## Creating tibbles
-
-If you need to make a tibble "by hand", you can use `tibble()` or `tribble()`.
-`tibble()` works by assembling individual vectors:
-
-```{r}
-x <- c(1, 2, 5)
-y <- c("a", "b", "h")
-
-tibble(x, y)
-```
-
-You can also optionally name the inputs, provide data inline with `c()`, and perform computation:
-
-```{r}
-tibble(
-  x1 = x,
-  x2 = c(10, 15, 25),
-  y = sqrt(x1^2 + x2^2)
-)
-```
-
-Every column in a data frame or tibble must be same length, so you'll get an error if the lengths are different:
-
-```{r}
-#| error: true
-
-tibble(
-  x = c(1, 5),
-  y = c("a", "b", "c")
-)
-```
-
-As the error suggests, individual values will be recycled to the same length as everything else:
-
-```{r}
-tibble(
-  x = 1:5,
-  y = "a",
-  z = TRUE
-)
-```
-
-Another way to create a tibble is with `tribble()`, which short for **tr**ansposed t**ibble**.
-`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
-This makes it possible to lay out small amounts of data in an easy to read form:
-
-```{r}
-tribble(
-  ~x, ~y, ~z,
-  "a", 2, 3.6,
-  "b", 1, 8.5
-)
-```
-
-Finally, if you have a regular `data.frame` you can turn it into to a tibble with `as_tibble()`:
-
-```{r}
-as_tibble(mtcars)
-```
-
-The inverse of `as_tibble()` is `as.data.frame()`; it converts a tibble back into a regular `data.frame`.
-
-## Non-syntactic names
-
-It's possible for a tibble to have column names that are not valid R variable names, names that are **non-syntactic**.
-For example, the variables might not start with a letter or they might contain unusual characters like a space.
-To refer to these variables, you need to surround them with backticks, `` ` ``:
-
-```{r}
-tb <- tibble(
-  `:)` = "smile", 
-  ` ` = "space",
-  `2000` = "number"
-)
-tb
-```
-
-You'll also need the backticks when working with these variables in other packages, like ggplot2, dplyr, and tidyr.
-
 ## Tibbles vs. data.frame

 There are two main differences in the usage of a tibble vs. a classic `data.frame`: printing and subsetting.
@@ -244,24 +164,10 @@ If you hit one of those functions, just use `as.data.frame()` to turn your tibbl

 3.  If you have the name of a variable stored in an object, e.g. `var <- "mpg"`, how can you extract the reference variable from a tibble?

-4.  Practice referring to non-syntactic names in the following data frame by:
-
-    a.  Extracting the variable called `1`.
-    b.  Plotting a scatterplot of `1` vs `2`.
-    c.  Creating a new column called `3` which is `2` divided by `1`.
-    d.  Renaming the columns to `one`, `two` and `three`.
-
-    ```{r}
-    annoying <- tibble(
-      `1` = 1:10,
-      `2` = `1` * 2 + rnorm(length(`1`))
-    )
-    ```
-
-5.  What does `tibble::enframe()` do?
+4.  What does `tibble::enframe()` do?
    When might you use it?

-6.  What option controls how many additional column names are printed at the footer of a tibble?
+5.  What option controls how many additional column names are printed at the footer of a tibble?

 ## Summary