Update rectangling.qmd (#1069)

It is a very nicely written chapter. Here I provided a few corrections on typos and errors.
This commit is contained in:
Y. Yu 2022-08-16 07:55:04 -04:00 committed by GitHub
parent 476f2c8282
commit d080f3279c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 13 additions and 13 deletions

View File

@ -126,8 +126,8 @@ knitr::include_graphics("screenshots/View-3.png", dpi = 220)
### List-columns
Lists can also live inside a tibble, where we call them list-columns.
List-columns are useful because they allow you to shoehorn in objects that wouldn't wouldn't usually belong in a tibble.
In particular, list-columns are are used a lot in the [tidymodels](https://www.tidymodels.org) ecosystem, because they allows you to store things like models or resamples in a data frame.
List-columns are useful because they allow you to shoehorn in objects that wouldn't usually belong in a tibble.
In particular, list-columns are are used a lot in the [tidymodels](https://www.tidymodels.org) ecosystem, because they allow you to store things like models or resamples in a data frame.
Here's a simple example of a list-column:
@ -187,7 +187,7 @@ It's easier to use list-columns with tibbles because `tibble()` treats lists lik
## Unnesting
Now that you've learned the basics of lists and list-columns, lets explore how you can turn them back into regular rows and columns.
Now that you've learned the basics of lists and list-columns, let's explore how you can turn them back into regular rows and columns.
We'll start with very simple sample data so you can get the basic idea, and then switch to more realistic examples in the next section.
List-columns tend to come in two basic forms: named and unnamed.
@ -195,7 +195,7 @@ When the children are **named**, they tend to have the same names in every row.
When the children are **unnamed**, the number of elements tends to vary from row-to-row.
The following code creates an example of each.
In `df1`, every element of list-column `y` has two elements named `a` and `b`.
If `df2`, the elements of list-column `y` are unnamed and vary in length.
In `df2`, the elements of list-column `y` are unnamed and vary in length.
```{r}
df1 <- tribble(
@ -316,7 +316,7 @@ You might wonder if this breaks the commandment that every element of a column m
What happens if you find this problem in a dataset you're trying to rectangle?
There are two basic options.
You could use the `transform` argument to coerce all inputs to a common type.
It's not particularly useful here because there's only really one class that these five class can be converted to: character.
It's not particularly useful here because there's only really one class that these five class can be converted to character.
```{r}
df4 |>
@ -371,7 +371,7 @@ These are good to know about when you're other people's code and for tackling ra
## Case studies
So far you've learned about the simplest case of list-columns, where rectangling only requires a single call to `unnest_longer()` or `unnest_wider()`.
The main difference between real data and these simple examples is that real data typically containsmultiple levels of nesting that requires multiple calls to `unnest_longer()` and `unnest_wider()`.
The main difference between real data and these simple examples is that real data typically contains multiple levels of nesting that require multiple calls to `unnest_longer()` and `unnest_wider()`.
This section will work through four real rectangling challenges using datasets from the repurrrsive package that are inspired by datasets that we've encountered in the wild.
### Very wide data
@ -426,7 +426,7 @@ repos |>
You can use this to work back to understand how `gh_repos` was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.
`owner` is another list-column, and since it a contains a named list, we can use `unnest_wider()` to get at the values:
`owner` is another list-column, and since it contains a named list, we can use `unnest_wider()` to get at the values:
```{r}
#| error: true
@ -624,7 +624,7 @@ locations |>
unnest_wider(location)
```
Extracting the bounds requires a few more steps
Extracting the bounds requires a few more steps:
```{r}
locations |>
@ -649,7 +649,7 @@ locations |>
Note how we unnest two columns simultaneously by supplying a vector of variable names to `unnest_wider()`.
This somewhere that `hoist()`, mentioned briefly above, can be useful.
This is somewhere that `hoist()`, mentioned briefly above, can be useful.
Once you've discovered the path to get to the components you're interested in, you can extract them directly using `hoist()`:
```{r}
@ -711,17 +711,17 @@ Four of them are scalars:
- The simplest type is a null, which is written `null`, which plays the same role as both `NULL` and `NA` in R. It represents the absence of data.
- A **string** is much like a string in R, but must use double quotes, not single quotes.
- A **number** is similar to R's numbers: they can be use integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support Inf, -Inf, or NaN.
- A **number** is similar to R's numbers: they can be integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support Inf, -Inf, or NaN.
- A **boolean** is similar to R's `TRUE` and `FALSE`, but use lower case `true` and `false`.
JSON's strings, numbers, and booleans are pretty similar to R's character, numeric, and logical vectors.
The main difference is that JSON's scalars can only represent a single value.
To represent multiple values you need to use one of the two remaining two types, arrays and objects.
To represent multiple values you need to use one of the two remaining types, arrays and objects.
Both arrays and objects are similar to lists in R; the difference is whether or not they're named.
An **array** is like an unnamed list, and is written with `[]`.
For example `[1, 2, 3]` is an array containing 3 numbers, and `[null, 1, "string", false]` is an array that contains a null, a number, a string, and a boolean.
An **object** is like a named list, and they're written with `{}`.
An **object** is like a named list, and it's written with `{}`.
For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
### jsonlite
@ -729,7 +729,7 @@ For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
To convert JSON into R data structures, we recommend that you use the jsonlite package, by Jeroen Oooms.
We'll use only two jsonlite functions: `read_json()` and `parse_json()`.
In real life, you'll use `read_json()` to read a JSON file from disk.
For example, we the repurrsive package also provides the source for `gh_user` as a JSON file:
For example, the repurrsive package also provides the source for `gh_user` as a JSON file:
```{r}
# A path to a json file inside the package: