Move important tibble content earlier in the book (#1110)
Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
		@@ -201,6 +201,29 @@ There are a few good reasons to favor readr functions over the base equivalents:
 | 
			
		||||
-   They are more reproducible.
 | 
			
		||||
    Base R functions inherit some behavior from your operating system and environment variables, so import code that works on your computer might not work on someone else's.
 | 
			
		||||
 | 
			
		||||
### Non-syntactic names
 | 
			
		||||
 | 
			
		||||
It's possible for a CSV file to have column names that are not valid R variable names, we refer to these as **non-syntactic** names.
 | 
			
		||||
For example, the variables might not start with a letter or they might contain unusual characters like a space:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
df <- read_csv("data/non-syntactic.csv", col_types = list())
 | 
			
		||||
df
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You'll notice that they print surrounded by backticks, which you'll need to use when referring to them in other functions:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
df |> relocate(`2000`, .after = `:)`)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
These values only need special handling when they appear in column names.
 | 
			
		||||
If you turn them into data (e.g. with `pivot_longer()`) they are just regular strings:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
df |> pivot_longer(everything())
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  What function would you use to read a file where fields were separated with "\|"?
 | 
			
		||||
@@ -232,6 +255,20 @@ There are a few good reasons to favor readr functions over the base equivalents:
 | 
			
		||||
    read_csv("a;b\n1;3")
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
6.  Practice referring to non-syntactic names in the following data frame by:
 | 
			
		||||
 | 
			
		||||
    a.  Extracting the variable called `1`.
 | 
			
		||||
    b.  Plotting a scatterplot of `1` vs `2`.
 | 
			
		||||
    c.  Creating a new column called `3` which is `2` divided by `1`.
 | 
			
		||||
    d.  Renaming the columns to `one`, `two` and `three`.
 | 
			
		||||
 | 
			
		||||
    ```{r}
 | 
			
		||||
    annoying <- tibble(
 | 
			
		||||
      `1` = 1:10,
 | 
			
		||||
      `2` = `1` * 2 + rnorm(length(`1`))
 | 
			
		||||
    )
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
## Reading data from multiple files {#sec-readr-directory}
 | 
			
		||||
 | 
			
		||||
Sometimes your data is split across multiple files instead of being contained in a single file.
 | 
			
		||||
@@ -326,9 +363,50 @@ file.remove("students-2.csv")
 | 
			
		||||
file.remove("students.rds")
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Data entry
 | 
			
		||||
 | 
			
		||||
Sometimes you'll need to assemble a tibble "by hand" doing a little data entry in your R script.
 | 
			
		||||
There are two useful functions to help you do this which differ in whether you layout the tibble by columns or by rows.
 | 
			
		||||
`tibble()` works by column:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
tibble(
 | 
			
		||||
  x = c(1, 2, 5), 
 | 
			
		||||
  y = c("h", "m", "g"),
 | 
			
		||||
  z = c(0.08, 0.83, 0.60)
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Note that every column in tibble must be same size, so you'll get an error if they're not:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| error: true
 | 
			
		||||
 | 
			
		||||
tibble(
 | 
			
		||||
  x = c(1, 2),
 | 
			
		||||
  y = c("h", "m", "g"),
 | 
			
		||||
  z = c(0.08, 0.83, 0.6)
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Laying out the data by column can make it hard to see how the rows are related, so an alternative is `tribble()`, short for **tr**ansposed t**ibble**, which lets you lay out your data row by row.
 | 
			
		||||
`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
 | 
			
		||||
This makes it possible to lay out small amounts of data in an easy to read form:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
tribble(
 | 
			
		||||
  ~x, ~y, ~z,
 | 
			
		||||
  "h", 1, 0.08,
 | 
			
		||||
  "m", 2, 0.83,
 | 
			
		||||
  "g", 5, 0.60,
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
We'll use `tibble()` and `tribble()` later in the book to construct small examples to demonstrate how various functions work.
 | 
			
		||||
 | 
			
		||||
## Summary
 | 
			
		||||
 | 
			
		||||
In this chapter, you've learned how to use readr to load rectangular flat files from disk into R.
 | 
			
		||||
In this chapter, you've learned how to load CSV files with `read_csv()` and to do your own data entry with `tibble()` and `tribble()`.
 | 
			
		||||
You've learned how csv files work, some of the problems you might encounter, and how to overcome them.
 | 
			
		||||
We'll come to data import a few times in this book: @sec-import-databases will show you how to load data from databases, @sec-import-spreadsheets from Excel and googlesheets, @sec-rectangling from JSON, and @sec-scraping from websites.
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
							
								
								
									
										2
									
								
								data/non-syntactic.csv
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								data/non-syntactic.csv
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,2 @@
 | 
			
		||||
:),x y,2000
 | 
			
		||||
smile,space,number
 | 
			
		||||
		
		
			
  | 
							
								
								
									
										98
									
								
								tibble.qmd
									
									
									
									
									
								
							
							
						
						
									
										98
									
								
								tibble.qmd
									
									
									
									
									
								
							@@ -27,86 +27,6 @@ In this chapter we'll explore the **tibble** package, part of the core tidyverse
 | 
			
		||||
library(tidyverse)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Creating tibbles
 | 
			
		||||
 | 
			
		||||
If you need to make a tibble "by hand", you can use `tibble()` or `tribble()`.
 | 
			
		||||
`tibble()` works by assembling individual vectors:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
x <- c(1, 2, 5)
 | 
			
		||||
y <- c("a", "b", "h")
 | 
			
		||||
 | 
			
		||||
tibble(x, y)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You can also optionally name the inputs, provide data inline with `c()`, and perform computation:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
tibble(
 | 
			
		||||
  x1 = x,
 | 
			
		||||
  x2 = c(10, 15, 25),
 | 
			
		||||
  y = sqrt(x1^2 + x2^2)
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Every column in a data frame or tibble must be same length, so you'll get an error if the lengths are different:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| error: true
 | 
			
		||||
 | 
			
		||||
tibble(
 | 
			
		||||
  x = c(1, 5),
 | 
			
		||||
  y = c("a", "b", "c")
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
As the error suggests, individual values will be recycled to the same length as everything else:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
tibble(
 | 
			
		||||
  x = 1:5,
 | 
			
		||||
  y = "a",
 | 
			
		||||
  z = TRUE
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Another way to create a tibble is with `tribble()`, which short for **tr**ansposed t**ibble**.
 | 
			
		||||
`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
 | 
			
		||||
This makes it possible to lay out small amounts of data in an easy to read form:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
tribble(
 | 
			
		||||
  ~x, ~y, ~z,
 | 
			
		||||
  "a", 2, 3.6,
 | 
			
		||||
  "b", 1, 8.5
 | 
			
		||||
)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Finally, if you have a regular `data.frame` you can turn it into to a tibble with `as_tibble()`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
as_tibble(mtcars)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The inverse of `as_tibble()` is `as.data.frame()`; it converts a tibble back into a regular `data.frame`.
 | 
			
		||||
 | 
			
		||||
## Non-syntactic names
 | 
			
		||||
 | 
			
		||||
It's possible for a tibble to have column names that are not valid R variable names, names that are **non-syntactic**.
 | 
			
		||||
For example, the variables might not start with a letter or they might contain unusual characters like a space.
 | 
			
		||||
To refer to these variables, you need to surround them with backticks, `` ` ``:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
tb <- tibble(
 | 
			
		||||
  `:)` = "smile", 
 | 
			
		||||
  ` ` = "space",
 | 
			
		||||
  `2000` = "number"
 | 
			
		||||
)
 | 
			
		||||
tb
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You'll also need the backticks when working with these variables in other packages, like ggplot2, dplyr, and tidyr.
 | 
			
		||||
 | 
			
		||||
## Tibbles vs. data.frame
 | 
			
		||||
 | 
			
		||||
There are two main differences in the usage of a tibble vs. a classic `data.frame`: printing and subsetting.
 | 
			
		||||
@@ -244,24 +164,10 @@ If you hit one of those functions, just use `as.data.frame()` to turn your tibbl
 | 
			
		||||
 | 
			
		||||
3.  If you have the name of a variable stored in an object, e.g. `var <- "mpg"`, how can you extract the reference variable from a tibble?
 | 
			
		||||
 | 
			
		||||
4.  Practice referring to non-syntactic names in the following data frame by:
 | 
			
		||||
 | 
			
		||||
    a.  Extracting the variable called `1`.
 | 
			
		||||
    b.  Plotting a scatterplot of `1` vs `2`.
 | 
			
		||||
    c.  Creating a new column called `3` which is `2` divided by `1`.
 | 
			
		||||
    d.  Renaming the columns to `one`, `two` and `three`.
 | 
			
		||||
 | 
			
		||||
    ```{r}
 | 
			
		||||
    annoying <- tibble(
 | 
			
		||||
      `1` = 1:10,
 | 
			
		||||
      `2` = `1` * 2 + rnorm(length(`1`))
 | 
			
		||||
    )
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
5.  What does `tibble::enframe()` do?
 | 
			
		||||
4.  What does `tibble::enframe()` do?
 | 
			
		||||
    When might you use it?
 | 
			
		||||
 | 
			
		||||
6.  What option controls how many additional column names are printed at the footer of a tibble?
 | 
			
		||||
5.  What option controls how many additional column names are printed at the footer of a tibble?
 | 
			
		||||
 | 
			
		||||
## Summary
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user