Update rectangling.qmd (#1069)
It is a very nicely written chapter. Here I provided a few corrections on typos and errors.
This commit is contained in:
parent
476f2c8282
commit
d080f3279c
|
@ -126,8 +126,8 @@ knitr::include_graphics("screenshots/View-3.png", dpi = 220)
|
|||
### List-columns
|
||||
|
||||
Lists can also live inside a tibble, where we call them list-columns.
|
||||
List-columns are useful because they allow you to shoehorn in objects that wouldn't wouldn't usually belong in a tibble.
|
||||
In particular, list-columns are are used a lot in the [tidymodels](https://www.tidymodels.org) ecosystem, because they allows you to store things like models or resamples in a data frame.
|
||||
List-columns are useful because they allow you to shoehorn in objects that wouldn't usually belong in a tibble.
|
||||
In particular, list-columns are are used a lot in the [tidymodels](https://www.tidymodels.org) ecosystem, because they allow you to store things like models or resamples in a data frame.
|
||||
|
||||
Here's a simple example of a list-column:
|
||||
|
||||
|
@ -187,7 +187,7 @@ It's easier to use list-columns with tibbles because `tibble()` treats lists lik
|
|||
|
||||
## Unnesting
|
||||
|
||||
Now that you've learned the basics of lists and list-columns, lets explore how you can turn them back into regular rows and columns.
|
||||
Now that you've learned the basics of lists and list-columns, let's explore how you can turn them back into regular rows and columns.
|
||||
We'll start with very simple sample data so you can get the basic idea, and then switch to more realistic examples in the next section.
|
||||
|
||||
List-columns tend to come in two basic forms: named and unnamed.
|
||||
|
@ -195,7 +195,7 @@ When the children are **named**, they tend to have the same names in every row.
|
|||
When the children are **unnamed**, the number of elements tends to vary from row-to-row.
|
||||
The following code creates an example of each.
|
||||
In `df1`, every element of list-column `y` has two elements named `a` and `b`.
|
||||
If `df2`, the elements of list-column `y` are unnamed and vary in length.
|
||||
In `df2`, the elements of list-column `y` are unnamed and vary in length.
|
||||
|
||||
```{r}
|
||||
df1 <- tribble(
|
||||
|
@ -316,7 +316,7 @@ You might wonder if this breaks the commandment that every element of a column m
|
|||
What happens if you find this problem in a dataset you're trying to rectangle?
|
||||
There are two basic options.
|
||||
You could use the `transform` argument to coerce all inputs to a common type.
|
||||
It's not particularly useful here because there's only really one class that these five class can be converted to: character.
|
||||
It's not particularly useful here because there's only really one class that these five class can be converted to character.
|
||||
|
||||
```{r}
|
||||
df4 |>
|
||||
|
@ -371,7 +371,7 @@ These are good to know about when you're other people's code and for tackling ra
|
|||
## Case studies
|
||||
|
||||
So far you've learned about the simplest case of list-columns, where rectangling only requires a single call to `unnest_longer()` or `unnest_wider()`.
|
||||
The main difference between real data and these simple examples is that real data typically containsmultiple levels of nesting that requires multiple calls to `unnest_longer()` and `unnest_wider()`.
|
||||
The main difference between real data and these simple examples is that real data typically contains multiple levels of nesting that require multiple calls to `unnest_longer()` and `unnest_wider()`.
|
||||
This section will work through four real rectangling challenges using datasets from the repurrrsive package that are inspired by datasets that we've encountered in the wild.
|
||||
|
||||
### Very wide data
|
||||
|
@ -426,7 +426,7 @@ repos |>
|
|||
|
||||
You can use this to work back to understand how `gh_repos` was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.
|
||||
|
||||
`owner` is another list-column, and since it a contains a named list, we can use `unnest_wider()` to get at the values:
|
||||
`owner` is another list-column, and since it contains a named list, we can use `unnest_wider()` to get at the values:
|
||||
|
||||
```{r}
|
||||
#| error: true
|
||||
|
@ -624,7 +624,7 @@ locations |>
|
|||
unnest_wider(location)
|
||||
```
|
||||
|
||||
Extracting the bounds requires a few more steps
|
||||
Extracting the bounds requires a few more steps:
|
||||
|
||||
```{r}
|
||||
locations |>
|
||||
|
@ -649,7 +649,7 @@ locations |>
|
|||
|
||||
Note how we unnest two columns simultaneously by supplying a vector of variable names to `unnest_wider()`.
|
||||
|
||||
This somewhere that `hoist()`, mentioned briefly above, can be useful.
|
||||
This is somewhere that `hoist()`, mentioned briefly above, can be useful.
|
||||
Once you've discovered the path to get to the components you're interested in, you can extract them directly using `hoist()`:
|
||||
|
||||
```{r}
|
||||
|
@ -711,17 +711,17 @@ Four of them are scalars:
|
|||
|
||||
- The simplest type is a null, which is written `null`, which plays the same role as both `NULL` and `NA` in R. It represents the absence of data.
|
||||
- A **string** is much like a string in R, but must use double quotes, not single quotes.
|
||||
- A **number** is similar to R's numbers: they can be use integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support Inf, -Inf, or NaN.
|
||||
- A **number** is similar to R's numbers: they can be integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support Inf, -Inf, or NaN.
|
||||
- A **boolean** is similar to R's `TRUE` and `FALSE`, but use lower case `true` and `false`.
|
||||
|
||||
JSON's strings, numbers, and booleans are pretty similar to R's character, numeric, and logical vectors.
|
||||
The main difference is that JSON's scalars can only represent a single value.
|
||||
To represent multiple values you need to use one of the two remaining two types, arrays and objects.
|
||||
To represent multiple values you need to use one of the two remaining types, arrays and objects.
|
||||
|
||||
Both arrays and objects are similar to lists in R; the difference is whether or not they're named.
|
||||
An **array** is like an unnamed list, and is written with `[]`.
|
||||
For example `[1, 2, 3]` is an array containing 3 numbers, and `[null, 1, "string", false]` is an array that contains a null, a number, a string, and a boolean.
|
||||
An **object** is like a named list, and they're written with `{}`.
|
||||
An **object** is like a named list, and it's written with `{}`.
|
||||
For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
||||
|
||||
### jsonlite
|
||||
|
@ -729,7 +729,7 @@ For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
|||
To convert JSON into R data structures, we recommend that you use the jsonlite package, by Jeroen Oooms.
|
||||
We'll use only two jsonlite functions: `read_json()` and `parse_json()`.
|
||||
In real life, you'll use `read_json()` to read a JSON file from disk.
|
||||
For example, we the repurrsive package also provides the source for `gh_user` as a JSON file:
|
||||
For example, the repurrsive package also provides the source for `gh_user` as a JSON file:
|
||||
|
||||
```{r}
|
||||
# A path to a json file inside the package:
|
||||
|
|
Loading…
Reference in New Issue