Move important tibble content earlier in the book (#1110)
Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
parent
c0461b11bd
commit
f93a5daeeb
|
@ -201,6 +201,29 @@ There are a few good reasons to favor readr functions over the base equivalents:
|
|||
- They are more reproducible.
|
||||
Base R functions inherit some behavior from your operating system and environment variables, so import code that works on your computer might not work on someone else's.
|
||||
|
||||
### Non-syntactic names
|
||||
|
||||
It's possible for a CSV file to have column names that are not valid R variable names, we refer to these as **non-syntactic** names.
|
||||
For example, the variables might not start with a letter or they might contain unusual characters like a space:
|
||||
|
||||
```{r}
|
||||
df <- read_csv("data/non-syntactic.csv", col_types = list())
|
||||
df
|
||||
```
|
||||
|
||||
You'll notice that they print surrounded by backticks, which you'll need to use when referring to them in other functions:
|
||||
|
||||
```{r}
|
||||
df |> relocate(`2000`, .after = `:)`)
|
||||
```
|
||||
|
||||
These values only need special handling when they appear in column names.
|
||||
If you turn them into data (e.g. with `pivot_longer()`) they are just regular strings:
|
||||
|
||||
```{r}
|
||||
df |> pivot_longer(everything())
|
||||
```
|
||||
|
||||
### Exercises
|
||||
|
||||
1. What function would you use to read a file where fields were separated with "\|"?
|
||||
|
@ -232,6 +255,20 @@ There are a few good reasons to favor readr functions over the base equivalents:
|
|||
read_csv("a;b\n1;3")
|
||||
```
|
||||
|
||||
6. Practice referring to non-syntactic names in the following data frame by:
|
||||
|
||||
a. Extracting the variable called `1`.
|
||||
b. Plotting a scatterplot of `1` vs `2`.
|
||||
c. Creating a new column called `3` which is `2` divided by `1`.
|
||||
d. Renaming the columns to `one`, `two` and `three`.
|
||||
|
||||
```{r}
|
||||
annoying <- tibble(
|
||||
`1` = 1:10,
|
||||
`2` = `1` * 2 + rnorm(length(`1`))
|
||||
)
|
||||
```
|
||||
|
||||
## Reading data from multiple files {#sec-readr-directory}
|
||||
|
||||
Sometimes your data is split across multiple files instead of being contained in a single file.
|
||||
|
@ -326,9 +363,50 @@ file.remove("students-2.csv")
|
|||
file.remove("students.rds")
|
||||
```
|
||||
|
||||
## Data entry
|
||||
|
||||
Sometimes you'll need to assemble a tibble "by hand" doing a little data entry in your R script.
|
||||
There are two useful functions to help you do this which differ in whether you layout the tibble by columns or by rows.
|
||||
`tibble()` works by column:
|
||||
|
||||
```{r}
|
||||
tibble(
|
||||
x = c(1, 2, 5),
|
||||
y = c("h", "m", "g"),
|
||||
z = c(0.08, 0.83, 0.60)
|
||||
)
|
||||
```
|
||||
|
||||
Note that every column in tibble must be same size, so you'll get an error if they're not:
|
||||
|
||||
```{r}
|
||||
#| error: true
|
||||
|
||||
tibble(
|
||||
x = c(1, 2),
|
||||
y = c("h", "m", "g"),
|
||||
z = c(0.08, 0.83, 0.6)
|
||||
)
|
||||
```
|
||||
|
||||
Laying out the data by column can make it hard to see how the rows are related, so an alternative is `tribble()`, short for **tr**ansposed t**ibble**, which lets you lay out your data row by row.
|
||||
`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
|
||||
This makes it possible to lay out small amounts of data in an easy to read form:
|
||||
|
||||
```{r}
|
||||
tribble(
|
||||
~x, ~y, ~z,
|
||||
"h", 1, 0.08,
|
||||
"m", 2, 0.83,
|
||||
"g", 5, 0.60,
|
||||
)
|
||||
```
|
||||
|
||||
We'll use `tibble()` and `tribble()` later in the book to construct small examples to demonstrate how various functions work.
|
||||
|
||||
## Summary
|
||||
|
||||
In this chapter, you've learned how to use readr to load rectangular flat files from disk into R.
|
||||
In this chapter, you've learned how to load CSV files with `read_csv()` and to do your own data entry with `tibble()` and `tribble()`.
|
||||
You've learned how csv files work, some of the problems you might encounter, and how to overcome them.
|
||||
We'll come to data import a few times in this book: @sec-import-databases will show you how to load data from databases, @sec-import-spreadsheets from Excel and googlesheets, @sec-rectangling from JSON, and @sec-scraping from websites.
|
||||
|
||||
|
|
|
@ -0,0 +1,2 @@
|
|||
:),x y,2000
|
||||
smile,space,number
|
|
98
tibble.qmd
98
tibble.qmd
|
@ -27,86 +27,6 @@ In this chapter we'll explore the **tibble** package, part of the core tidyverse
|
|||
library(tidyverse)
|
||||
```
|
||||
|
||||
## Creating tibbles
|
||||
|
||||
If you need to make a tibble "by hand", you can use `tibble()` or `tribble()`.
|
||||
`tibble()` works by assembling individual vectors:
|
||||
|
||||
```{r}
|
||||
x <- c(1, 2, 5)
|
||||
y <- c("a", "b", "h")
|
||||
|
||||
tibble(x, y)
|
||||
```
|
||||
|
||||
You can also optionally name the inputs, provide data inline with `c()`, and perform computation:
|
||||
|
||||
```{r}
|
||||
tibble(
|
||||
x1 = x,
|
||||
x2 = c(10, 15, 25),
|
||||
y = sqrt(x1^2 + x2^2)
|
||||
)
|
||||
```
|
||||
|
||||
Every column in a data frame or tibble must be same length, so you'll get an error if the lengths are different:
|
||||
|
||||
```{r}
|
||||
#| error: true
|
||||
|
||||
tibble(
|
||||
x = c(1, 5),
|
||||
y = c("a", "b", "c")
|
||||
)
|
||||
```
|
||||
|
||||
As the error suggests, individual values will be recycled to the same length as everything else:
|
||||
|
||||
```{r}
|
||||
tibble(
|
||||
x = 1:5,
|
||||
y = "a",
|
||||
z = TRUE
|
||||
)
|
||||
```
|
||||
|
||||
Another way to create a tibble is with `tribble()`, which short for **tr**ansposed t**ibble**.
|
||||
`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
|
||||
This makes it possible to lay out small amounts of data in an easy to read form:
|
||||
|
||||
```{r}
|
||||
tribble(
|
||||
~x, ~y, ~z,
|
||||
"a", 2, 3.6,
|
||||
"b", 1, 8.5
|
||||
)
|
||||
```
|
||||
|
||||
Finally, if you have a regular `data.frame` you can turn it into to a tibble with `as_tibble()`:
|
||||
|
||||
```{r}
|
||||
as_tibble(mtcars)
|
||||
```
|
||||
|
||||
The inverse of `as_tibble()` is `as.data.frame()`; it converts a tibble back into a regular `data.frame`.
|
||||
|
||||
## Non-syntactic names
|
||||
|
||||
It's possible for a tibble to have column names that are not valid R variable names, names that are **non-syntactic**.
|
||||
For example, the variables might not start with a letter or they might contain unusual characters like a space.
|
||||
To refer to these variables, you need to surround them with backticks, `` ` ``:
|
||||
|
||||
```{r}
|
||||
tb <- tibble(
|
||||
`:)` = "smile",
|
||||
` ` = "space",
|
||||
`2000` = "number"
|
||||
)
|
||||
tb
|
||||
```
|
||||
|
||||
You'll also need the backticks when working with these variables in other packages, like ggplot2, dplyr, and tidyr.
|
||||
|
||||
## Tibbles vs. data.frame
|
||||
|
||||
There are two main differences in the usage of a tibble vs. a classic `data.frame`: printing and subsetting.
|
||||
|
@ -244,24 +164,10 @@ If you hit one of those functions, just use `as.data.frame()` to turn your tibbl
|
|||
|
||||
3. If you have the name of a variable stored in an object, e.g. `var <- "mpg"`, how can you extract the reference variable from a tibble?
|
||||
|
||||
4. Practice referring to non-syntactic names in the following data frame by:
|
||||
|
||||
a. Extracting the variable called `1`.
|
||||
b. Plotting a scatterplot of `1` vs `2`.
|
||||
c. Creating a new column called `3` which is `2` divided by `1`.
|
||||
d. Renaming the columns to `one`, `two` and `three`.
|
||||
|
||||
```{r}
|
||||
annoying <- tibble(
|
||||
`1` = 1:10,
|
||||
`2` = `1` * 2 + rnorm(length(`1`))
|
||||
)
|
||||
```
|
||||
|
||||
5. What does `tibble::enframe()` do?
|
||||
4. What does `tibble::enframe()` do?
|
||||
When might you use it?
|
||||
|
||||
6. What option controls how many additional column names are printed at the footer of a tibble?
|
||||
5. What option controls how many additional column names are printed at the footer of a tibble?
|
||||
|
||||
## Summary
|
||||
|
||||
|
|
Loading…
Reference in New Issue