Fix/rectangling probably typos (#1486)
* probably a typo * probably a typo * - "a" is not a factor but a character. - probably a typo * a typo * probably a typo * probably a typo * "ab" is a string, but is not a character, though can be an element of a character vector * a typo * probably a typo * Update rectangling.qmd --------- Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
parent
b2c4c1d0d0
commit
0bd216b75a
|
@ -9,7 +9,7 @@ status("complete")
|
|||
|
||||
## Introduction
|
||||
|
||||
In this chapter, you'll learn the art of data **rectangling**, taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns.
|
||||
In this chapter, you'll learn the art of data **rectangling**: taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns.
|
||||
This is important because hierarchical data is surprisingly common, especially when working with data that comes from the web.
|
||||
|
||||
To learn about rectangling, you'll need to first learn about lists, the data structure that makes hierarchical data possible.
|
||||
|
@ -263,12 +263,12 @@ df6 |> unnest_longer(y)
|
|||
```
|
||||
|
||||
We get zero rows in the output, so the row effectively disappears.
|
||||
If you want to preserve that row, adding add `NA` in `y` by setting `keep_empty = TRUE`.
|
||||
If you want to preserve that row, adding `NA` in `y`, set `keep_empty = TRUE`.
|
||||
|
||||
### Inconsistent types
|
||||
|
||||
What happens if you unnest a list-column that contains different types of vector?
|
||||
For example, take the following dataset where the list-column `y` contains two numbers, a factor, and a logical, which can't normally be mixed in a single column.
|
||||
For example, take the following dataset where the list-column `y` contains two numbers, a character, and a logical, which can't normally be mixed in a single column.
|
||||
|
||||
```{r}
|
||||
df4 <- tribble(
|
||||
|
@ -292,7 +292,7 @@ Because `unnest_longer()` can't find a common type of vector, it keeps the origi
|
|||
You might wonder if this breaks the commandment that every element of a column must be the same type.
|
||||
It doesn't: every element is a list, even though the contents are of different types.
|
||||
|
||||
Dealing with inconsistent types is challenging and the details depend on the precise nature of the problem and your goals, but you'll mostly likely need tools from @sec-iteration.
|
||||
Dealing with inconsistent types is challenging and the details depend on the precise nature of the problem and your goals, but you'll most likely need tools from @sec-iteration.
|
||||
|
||||
### Other functions
|
||||
|
||||
|
@ -444,7 +444,7 @@ chars |>
|
|||
select(id, where(is.list))
|
||||
```
|
||||
|
||||
Lets explore the `titles` column.
|
||||
Let's explore the `titles` column.
|
||||
It's an unnamed list-column, so we'll unnest it into rows:
|
||||
|
||||
```{r}
|
||||
|
@ -509,7 +509,7 @@ locations
|
|||
|
||||
Now we can see why two cities got two results: Washington matched both Washington state and Washington, DC, and Arlington matched Arlington, Virginia and Arlington, Texas.
|
||||
|
||||
There are few different places we could go from here.
|
||||
There are a few different places we could go from here.
|
||||
We might want to determine the exact location of the match, which is stored in the `geometry` list-column:
|
||||
|
||||
```{r}
|
||||
|
@ -576,7 +576,7 @@ If these case studies have whetted your appetite for more real-life rectangling,
|
|||
Why can you only roughly estimate the date?
|
||||
|
||||
2. The `owner` column of `gh_repo` contains a lot of duplicated information because each owner can have many repos.
|
||||
Can you construct a `owners` data frame that contains one row for each owner?
|
||||
Can you construct an `owners` data frame that contains one row for each owner?
|
||||
(Hint: does `distinct()` work with `list-cols`?)
|
||||
|
||||
3. Follow the steps used for `titles` to create similar tables for the aliases, allegiances, books, and TV series for the Game of Thrones characters.
|
||||
|
@ -634,7 +634,7 @@ For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
|||
|
||||
Note that JSON doesn't have any native way to represent dates or date-times, so they're often stored as strings, and you'll need to use `readr::parse_date()` or `readr::parse_datetime()` to turn them into the correct data structure.
|
||||
Similarly, JSON's rules for representing floating point numbers in JSON are a little imprecise, so you'll also sometimes find numbers stored in strings.
|
||||
Apply `readr::parse_double()` as needed to the get correct variable type.
|
||||
Apply `readr::parse_double()` as needed to get the correct variable type.
|
||||
|
||||
### jsonlite
|
||||
|
||||
|
@ -741,7 +741,7 @@ df |>
|
|||
|
||||
In this chapter, you learned what lists are, how you can generate them from JSON files, and how turn them into rectangular data frames.
|
||||
Surprisingly we only need two new functions: `unnest_longer()` to put list elements into rows and `unnest_wider()` to put list elements into columns.
|
||||
It doesn't matter how deeply nested the list-column is, all you need to do is repeatedly call these two functions.
|
||||
It doesn't matter how deeply nested the list-column is; all you need to do is repeatedly call these two functions.
|
||||
|
||||
JSON is the most common data format returned by web APIs.
|
||||
What happens if the website doesn't have an API, but you can see data you want on the website?
|
||||
|
|
Loading…
Reference in New Issue