More minor page count tweaks & fixes
And re-convert with latest htmlbook
This commit is contained in:
		@@ -1,12 +1,12 @@
 | 
			
		||||
<section data-type="chapter" id="chp-rectangling">
 | 
			
		||||
<h1><span id="sec-rectangling" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Hierarchical data</span></span></h1>
 | 
			
		||||
<section id="introduction" data-type="sect1">
 | 
			
		||||
<section id="rectangling-introduction" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Introduction</h1>
 | 
			
		||||
<p>In this chapter, you’ll learn the art of data <strong>rectangling</strong>, taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns. This is important because hierarchical data is surprisingly common, especially when working with data that comes from the web.</p>
 | 
			
		||||
<p>To learn about rectangling, you’ll need to first learn about lists, the data structure that makes hierarchical data possible. Then you’ll learn about two crucial tidyr functions: <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">tidyr::unnest_longer()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">tidyr::unnest_wider()</a></code>. We’ll then show you a few case studies, applying these simple functions again and again to solve real problems. We’ll finish off by talking about JSON, the most frequent source of hierarchical datasets and a common format for data exchange on the web.</p>
 | 
			
		||||
 | 
			
		||||
<section id="prerequisites" data-type="sect2">
 | 
			
		||||
<section id="rectangling-prerequisites" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Prerequisites</h2>
 | 
			
		||||
<p>In this chapter, we’ll use many functions from tidyr, a core member of the tidyverse. We’ll also use repurrrsive to provide some interesting datasets for rectangling practice, and we’ll finish by using jsonlite to read JSON files into R lists.</p>
 | 
			
		||||
@@ -18,7 +18,7 @@ library(jsonlite)</pre>
 | 
			
		||||
</section>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="lists" data-type="sect1">
 | 
			
		||||
<section id="rectangling-lists" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Lists</h1>
 | 
			
		||||
<p>So far you’ve worked with data frames that contain simple vectors like integers, numbers, characters, date-times, and factors. These vectors are simple because they’re homogeneous: every element is of the same data type. If you want to store elements of different types in the same vector, you’ll need a <strong>list</strong>, which you create with <code><a href="https://rdrr.io/r/base/list.html">list()</a></code>:</p>
 | 
			
		||||
@@ -174,13 +174,19 @@ df
 | 
			
		||||
<p>Similarly, if you <code><a href="https://rdrr.io/r/utils/View.html">View()</a></code> a data frame in RStudio, you’ll get the standard tabular view, which doesn’t allow you to selectively expand list columns. To explore those fields you’ll need to <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> and view, e.g. <code>df |> pull(z) |> View()</code>.</p>
 | 
			
		||||
<div data-type="note"><h1>
 | 
			
		||||
Base R
 | 
			
		||||
</h1><p>It’s possible to put a list in a column of a <code>data.frame</code>, but it’s a lot fiddlier because <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> treats a list as a list of columns:</p><div class="cell">
 | 
			
		||||
</h1>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
<p>It’s possible to put a list in a column of a <code>data.frame</code>, but it’s a lot fiddlier because <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> treats a list as a list of columns:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">data.frame(x = list(1:3, 3:5))
 | 
			
		||||
#>   x.1.3 x.3.5
 | 
			
		||||
#> 1     1     3
 | 
			
		||||
#> 2     2     4
 | 
			
		||||
#> 3     3     5</pre>
 | 
			
		||||
</div><p>You can force <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> to treat a list as a list of rows by wrapping it in list <code><a href="https://rdrr.io/r/base/AsIs.html">I()</a></code>, but the result doesn’t print particularly well:</p><div class="cell">
 | 
			
		||||
</div>
 | 
			
		||||
<p>You can force <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> to treat a list as a list of rows by wrapping it in list <code><a href="https://rdrr.io/r/base/AsIs.html">I()</a></code>, but the result doesn’t print particularly well:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">data.frame(
 | 
			
		||||
  x = I(list(1:2, 3:5)), 
 | 
			
		||||
  y = c("1, 2", "3, 4, 5")
 | 
			
		||||
@@ -188,7 +194,10 @@ Base R
 | 
			
		||||
#>         x       y
 | 
			
		||||
#> 1    1, 2    1, 2
 | 
			
		||||
#> 2 3, 4, 5 3, 4, 5</pre>
 | 
			
		||||
</div><p>It’s easier to use list-columns with tibbles because <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> treats lists like vectors and the print method has been designed with lists in mind.</p></div>
 | 
			
		||||
</div>
 | 
			
		||||
<p>It’s easier to use list-columns with tibbles because <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> treats lists like vectors and the print method has been designed with lists in mind.</p>
 | 
			
		||||
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
</section>
 | 
			
		||||
</section>
 | 
			
		||||
@@ -220,7 +229,7 @@ df2 <- tribble(
 | 
			
		||||
 | 
			
		||||
<section id="unnest_wider" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
<code>unnest_wider()</code>
 | 
			
		||||
unnest_wider()
 | 
			
		||||
</h2>
 | 
			
		||||
<p>When each row has the same number of elements with the same names, like <code>df1</code>, it’s natural to put each component into its own column with <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code>:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -260,7 +269,7 @@ df2 <- tribble(
 | 
			
		||||
 | 
			
		||||
<section id="unnest_longer" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
<code>unnest_longer()</code>
 | 
			
		||||
unnest_longer()
 | 
			
		||||
</h2>
 | 
			
		||||
<p>When each row contains an unnamed list, it’s most natural to put each element into its own row with <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">unnest_longer()</a></code>:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -387,7 +396,7 @@ Inconsistent types</h2>
 | 
			
		||||
<p>You’ll learn more about <code><a href="https://purrr.tidyverse.org/reference/map.html">map_lgl()</a></code> in <a href="#chp-iteration" data-type="xref">#chp-iteration</a>.</p>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="other-functions" data-type="sect2">
 | 
			
		||||
<section id="rectangling-other-functions" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Other functions</h2>
 | 
			
		||||
<p>tidyr has a few other useful rectangling functions that we’re not going to cover in this book:</p>
 | 
			
		||||
@@ -400,7 +409,7 @@ Other functions</h2>
 | 
			
		||||
</ul><p>These functions are good to know about as you might encounter them when reading other people’s code or tackling rarer rectangling challenges yourself.</p>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="exercises" data-type="sect2">
 | 
			
		||||
<section id="rectangling-exercises" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Exercises</h2>
 | 
			
		||||
<ol type="1"><li>
 | 
			
		||||
@@ -460,51 +469,26 @@ repos
 | 
			
		||||
  unnest_longer(json) |> 
 | 
			
		||||
  unnest_wider(json) 
 | 
			
		||||
#> # A tibble: 176 × 68
 | 
			
		||||
#>         id name     full_name owner        private html_url description fork 
 | 
			
		||||
#>      <int> <chr>    <chr>     <list>       <lgl>   <chr>    <chr>       <lgl>
 | 
			
		||||
#> 1 61160198 after    gaborcsa… <named list> FALSE   https:/… Run Code i… FALSE
 | 
			
		||||
#> 2 40500181 argufy   gaborcsa… <named list> FALSE   https:/… Declarativ… FALSE
 | 
			
		||||
#> 3 36442442 ask      gaborcsa… <named list> FALSE   https:/… Friendly C… FALSE
 | 
			
		||||
#> 4 34924886 baseimp… gaborcsa… <named list> FALSE   https:/… Do we get … FALSE
 | 
			
		||||
#> 5 61620661 citest   gaborcsa… <named list> FALSE   https:/… Test R pac… TRUE 
 | 
			
		||||
#> 6 33907457 clisymb… gaborcsa… <named list> FALSE   https:/… Unicode sy… FALSE
 | 
			
		||||
#> # … with 170 more rows, and 60 more variables: url <chr>, forks_url <chr>,
 | 
			
		||||
#> #   keys_url <chr>, collaborators_url <chr>, teams_url <chr>,
 | 
			
		||||
#> #   hooks_url <chr>, issue_events_url <chr>, events_url <chr>,
 | 
			
		||||
#> #   assignees_url <chr>, branches_url <chr>, tags_url <chr>,
 | 
			
		||||
#> #   blobs_url <chr>, git_tags_url <chr>, git_refs_url <chr>,
 | 
			
		||||
#> #   trees_url <chr>, statuses_url <chr>, languages_url <chr>,
 | 
			
		||||
#> #   stargazers_url <chr>, contributors_url <chr>, subscribers_url <chr>, …</pre>
 | 
			
		||||
#>         id name        full_name         owner        private html_url       
 | 
			
		||||
#>      <int> <chr>       <chr>             <list>       <lgl>   <chr>          
 | 
			
		||||
#> 1 61160198 after       gaborcsardi/after <named list> FALSE   https://github…
 | 
			
		||||
#> 2 40500181 argufy      gaborcsardi/argu… <named list> FALSE   https://github…
 | 
			
		||||
#> 3 36442442 ask         gaborcsardi/ask   <named list> FALSE   https://github…
 | 
			
		||||
#> 4 34924886 baseimports gaborcsardi/base… <named list> FALSE   https://github…
 | 
			
		||||
#> 5 61620661 citest      gaborcsardi/cite… <named list> FALSE   https://github…
 | 
			
		||||
#> 6 33907457 clisymbols  gaborcsardi/clis… <named list> FALSE   https://github…
 | 
			
		||||
#> # … with 170 more rows, and 62 more variables: description <chr>,
 | 
			
		||||
#> #   fork <lgl>, url <chr>, forks_url <chr>, keys_url <chr>, …</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>This has worked but the result is a little overwhelming: there are so many columns that tibble doesn’t even print all of them! We can see them all with <code><a href="https://rdrr.io/r/base/names.html">names()</a></code>:</p>
 | 
			
		||||
<p>This has worked but the result is a little overwhelming: there are so many columns that tibble doesn’t even print all of them! We can see them all with <code><a href="https://rdrr.io/r/base/names.html">names()</a></code>; and here we look at the first 10:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">repos |> 
 | 
			
		||||
  unnest_longer(json) |> 
 | 
			
		||||
  unnest_wider(json) |> 
 | 
			
		||||
  names()
 | 
			
		||||
#>  [1] "id"                "name"              "full_name"        
 | 
			
		||||
#>  [4] "owner"             "private"           "html_url"         
 | 
			
		||||
#>  [7] "description"       "fork"              "url"              
 | 
			
		||||
#> [10] "forks_url"         "keys_url"          "collaborators_url"
 | 
			
		||||
#> [13] "teams_url"         "hooks_url"         "issue_events_url" 
 | 
			
		||||
#> [16] "events_url"        "assignees_url"     "branches_url"     
 | 
			
		||||
#> [19] "tags_url"          "blobs_url"         "git_tags_url"     
 | 
			
		||||
#> [22] "git_refs_url"      "trees_url"         "statuses_url"     
 | 
			
		||||
#> [25] "languages_url"     "stargazers_url"    "contributors_url" 
 | 
			
		||||
#> [28] "subscribers_url"   "subscription_url"  "commits_url"      
 | 
			
		||||
#> [31] "git_commits_url"   "comments_url"      "issue_comment_url"
 | 
			
		||||
#> [34] "contents_url"      "compare_url"       "merges_url"       
 | 
			
		||||
#> [37] "archive_url"       "downloads_url"     "issues_url"       
 | 
			
		||||
#> [40] "pulls_url"         "milestones_url"    "notifications_url"
 | 
			
		||||
#> [43] "labels_url"        "releases_url"      "deployments_url"  
 | 
			
		||||
#> [46] "created_at"        "updated_at"        "pushed_at"        
 | 
			
		||||
#> [49] "git_url"           "ssh_url"           "clone_url"        
 | 
			
		||||
#> [52] "svn_url"           "homepage"          "size"             
 | 
			
		||||
#> [55] "stargazers_count"  "watchers_count"    "language"         
 | 
			
		||||
#> [58] "has_issues"        "has_downloads"     "has_wiki"         
 | 
			
		||||
#> [61] "has_pages"         "forks_count"       "mirror_url"       
 | 
			
		||||
#> [64] "open_issues_count" "forks"             "open_issues"      
 | 
			
		||||
#> [67] "watchers"          "default_branch"</pre>
 | 
			
		||||
  names() |> 
 | 
			
		||||
  head(10)
 | 
			
		||||
#>  [1] "id"          "name"        "full_name"   "owner"       "private"    
 | 
			
		||||
#>  [6] "html_url"    "description" "fork"        "url"         "forks_url"</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Let’s select a few that look interesting:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -523,7 +507,7 @@ repos
 | 
			
		||||
#> 6 33907457 gaborcsardi/clisymbols  <named list [17]> Unicode symbols for CLI…
 | 
			
		||||
#> # … with 170 more rows</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>You can use this to work back to understand how <code>gh_repos</code> was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.</p>
 | 
			
		||||
<p>You can use this to work back to understand how <code>gh_repos</code> was structured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.</p>
 | 
			
		||||
<p><code>owner</code> is another list-column, and since it contains a named list, we can use <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code> to get at the values:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">repos |> 
 | 
			
		||||
@@ -531,11 +515,13 @@ repos
 | 
			
		||||
  unnest_wider(json) |> 
 | 
			
		||||
  select(id, full_name, owner, description) |> 
 | 
			
		||||
  unnest_wider(owner)
 | 
			
		||||
#> Error in `unpack()` at ]8;line = 121:col = 2;file:///Users/hadleywickham/Documents/tidy-data/tidyr/R/unnest-wider.Rtidyr/R/unnest-wider.R:121:2]8;;:
 | 
			
		||||
#> ! Names must be unique.
 | 
			
		||||
#> Error in `unnest_wider()`:
 | 
			
		||||
#> ! Can't duplicate names between the affected columns and the original
 | 
			
		||||
#>   data.
 | 
			
		||||
#> ✖ These names are duplicated:
 | 
			
		||||
#>   * "id" at locations 1 and 4.
 | 
			
		||||
#> ℹ Use argument `names_repair` to specify repair strategy.</pre>
 | 
			
		||||
#>   ℹ `id`, from `owner`.
 | 
			
		||||
#> ℹ Use `names_sep` to disambiguate using the column name.
 | 
			
		||||
#> ℹ Or use `names_repair` to specify a repair strategy.</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<!--# TODO: https://github.com/tidyverse/tidyr/issues/1390 -->
 | 
			
		||||
<p>Uh oh, this list column also contains an <code>id</code> column and we can’t have two <code>id</code> columns in the same data frame. Rather than following the advice to use <code>names_repair</code> (which would also work), we’ll instead use <code>names_sep</code>:</p>
 | 
			
		||||
@@ -546,21 +532,16 @@ repos
 | 
			
		||||
  select(id, full_name, owner, description) |> 
 | 
			
		||||
  unnest_wider(owner, names_sep = "_")
 | 
			
		||||
#> # A tibble: 176 × 20
 | 
			
		||||
#>         id full_name  owner_login owner_id owner_avatar_url owner_gravatar_id
 | 
			
		||||
#>      <int> <chr>      <chr>          <int> <chr>            <chr>            
 | 
			
		||||
#> 1 61160198 gaborcsar… gaborcsardi   660288 https://avatars… ""               
 | 
			
		||||
#> 2 40500181 gaborcsar… gaborcsardi   660288 https://avatars… ""               
 | 
			
		||||
#> 3 36442442 gaborcsar… gaborcsardi   660288 https://avatars… ""               
 | 
			
		||||
#> 4 34924886 gaborcsar… gaborcsardi   660288 https://avatars… ""               
 | 
			
		||||
#> 5 61620661 gaborcsar… gaborcsardi   660288 https://avatars… ""               
 | 
			
		||||
#> 6 33907457 gaborcsar… gaborcsardi   660288 https://avatars… ""               
 | 
			
		||||
#> # … with 170 more rows, and 14 more variables: owner_url <chr>,
 | 
			
		||||
#> #   owner_html_url <chr>, owner_followers_url <chr>,
 | 
			
		||||
#> #   owner_following_url <chr>, owner_gists_url <chr>,
 | 
			
		||||
#> #   owner_starred_url <chr>, owner_subscriptions_url <chr>,
 | 
			
		||||
#> #   owner_organizations_url <chr>, owner_repos_url <chr>,
 | 
			
		||||
#> #   owner_events_url <chr>, owner_received_events_url <chr>,
 | 
			
		||||
#> #   owner_type <chr>, owner_site_admin <lgl>, description <chr></pre>
 | 
			
		||||
#>         id full_name               owner_login owner_id owner_avatar_url     
 | 
			
		||||
#>      <int> <chr>                   <chr>          <int> <chr>                
 | 
			
		||||
#> 1 61160198 gaborcsardi/after       gaborcsardi   660288 https://avatars.gith…
 | 
			
		||||
#> 2 40500181 gaborcsardi/argufy      gaborcsardi   660288 https://avatars.gith…
 | 
			
		||||
#> 3 36442442 gaborcsardi/ask         gaborcsardi   660288 https://avatars.gith…
 | 
			
		||||
#> 4 34924886 gaborcsardi/baseimports gaborcsardi   660288 https://avatars.gith…
 | 
			
		||||
#> 5 61620661 gaborcsardi/citest      gaborcsardi   660288 https://avatars.gith…
 | 
			
		||||
#> 6 33907457 gaborcsardi/clisymbols  gaborcsardi   660288 https://avatars.gith…
 | 
			
		||||
#> # … with 170 more rows, and 15 more variables: owner_gravatar_id <chr>,
 | 
			
		||||
#> #   owner_url <chr>, owner_html_url <chr>, owner_followers_url <chr>, …</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>This gives another wide dataset, but you can see that <code>owner</code> appears to contain a lot of additional data about the person who “owns” the repository.</p>
 | 
			
		||||
</section>
 | 
			
		||||
@@ -588,17 +569,16 @@ chars
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">chars |> 
 | 
			
		||||
  unnest_wider(json)
 | 
			
		||||
#> # A tibble: 30 × 18
 | 
			
		||||
#>   url         id name  gender culture born  died  alive titles aliases father
 | 
			
		||||
#>   <chr>    <int> <chr> <chr>  <chr>   <chr> <chr> <lgl> <list> <list>  <chr> 
 | 
			
		||||
#> 1 https:/…  1022 Theo… Male   "Ironb… "In … ""    TRUE  <chr>  <chr>   ""    
 | 
			
		||||
#> 2 https:/…  1052 Tyri… Male   ""      "In … ""    TRUE  <chr>  <chr>   ""    
 | 
			
		||||
#> 3 https:/…  1074 Vict… Male   "Ironb… "In … ""    TRUE  <chr>  <chr>   ""    
 | 
			
		||||
#> 4 https:/…  1109 Will  Male   ""      ""    "In … FALSE <chr>  <chr>   ""    
 | 
			
		||||
#> 5 https:/…  1166 Areo… Male   "Norvo… "In … ""    TRUE  <chr>  <chr>   ""    
 | 
			
		||||
#> 6 https:/…  1267 Chett Male   ""      "At … "In … FALSE <chr>  <chr>   ""    
 | 
			
		||||
#> # … with 24 more rows, and 7 more variables: mother <chr>, spouse <chr>,
 | 
			
		||||
#> #   allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
 | 
			
		||||
#> #   playedBy <list></pre>
 | 
			
		||||
#>   url                    id name            gender culture    born           
 | 
			
		||||
#>   <chr>               <int> <chr>           <chr>  <chr>      <chr>          
 | 
			
		||||
#> 1 https://www.anapio…  1022 Theon Greyjoy   Male   "Ironborn" "In 278 AC or …
 | 
			
		||||
#> 2 https://www.anapio…  1052 Tyrion Lannist… Male   ""         "In 273 AC, at…
 | 
			
		||||
#> 3 https://www.anapio…  1074 Victarion Grey… Male   "Ironborn" "In 268 AC or …
 | 
			
		||||
#> 4 https://www.anapio…  1109 Will            Male   ""         ""             
 | 
			
		||||
#> 5 https://www.anapio…  1166 Areo Hotah      Male   "Norvoshi" "In 257 AC or …
 | 
			
		||||
#> 6 https://www.anapio…  1267 Chett           Male   ""         "At Hag's Mire"
 | 
			
		||||
#> # … with 24 more rows, and 12 more variables: died <chr>, alive <lgl>,
 | 
			
		||||
#> #   titles <list>, aliases <list>, father <chr>, mother <chr>, …</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>And selecting a few columns to make it easier to read:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -607,15 +587,15 @@ chars
 | 
			
		||||
  select(id, name, gender, culture, born, died, alive)
 | 
			
		||||
characters
 | 
			
		||||
#> # A tibble: 30 × 7
 | 
			
		||||
#>      id name              gender culture    born                  died  alive
 | 
			
		||||
#>   <int> <chr>             <chr>  <chr>      <chr>                 <chr> <lgl>
 | 
			
		||||
#> 1  1022 Theon Greyjoy     Male   "Ironborn" "In 278 AC or 279 AC… ""    TRUE 
 | 
			
		||||
#> 2  1052 Tyrion Lannister  Male   ""         "In 273 AC, at Caste… ""    TRUE 
 | 
			
		||||
#> 3  1074 Victarion Greyjoy Male   "Ironborn" "In 268 AC or before… ""    TRUE 
 | 
			
		||||
#> 4  1109 Will              Male   ""         ""                    "In … FALSE
 | 
			
		||||
#> 5  1166 Areo Hotah        Male   "Norvoshi" "In 257 AC or before… ""    TRUE 
 | 
			
		||||
#> 6  1267 Chett             Male   ""         "At Hag's Mire"       "In … FALSE
 | 
			
		||||
#> # … with 24 more rows</pre>
 | 
			
		||||
#>      id name              gender culture    born              died           
 | 
			
		||||
#>   <int> <chr>             <chr>  <chr>      <chr>             <chr>          
 | 
			
		||||
#> 1  1022 Theon Greyjoy     Male   "Ironborn" "In 278 AC or 27… ""             
 | 
			
		||||
#> 2  1052 Tyrion Lannister  Male   ""         "In 273 AC, at C… ""             
 | 
			
		||||
#> 3  1074 Victarion Greyjoy Male   "Ironborn" "In 268 AC or be… ""             
 | 
			
		||||
#> 4  1109 Will              Male   ""         ""                "In 297 AC, at…
 | 
			
		||||
#> 5  1166 Areo Hotah        Male   "Norvoshi" "In 257 AC or be… ""             
 | 
			
		||||
#> 6  1267 Chett             Male   ""         "At Hag's Mire"   "In 299 AC, at…
 | 
			
		||||
#> # … with 24 more rows, and 1 more variable: alive <lgl></pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>There are also many list-columns:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -828,15 +808,16 @@ Deeply nested</h2>
 | 
			
		||||
  unnest_wider(results)
 | 
			
		||||
locations
 | 
			
		||||
#> # A tibble: 7 × 6
 | 
			
		||||
#>   city       address_compone…¹ formatted_address geometry     place_id types 
 | 
			
		||||
#>   <chr>      <list>            <chr>             <list>       <chr>    <list>
 | 
			
		||||
#> 1 Houston    <list [4]>        Houston, TX, USA  <named list> ChIJAYW… <list>
 | 
			
		||||
#> 2 Washington <list [2]>        Washington, USA   <named list> ChIJ-bD… <list>
 | 
			
		||||
#> 3 Washington <list [4]>        Washington, DC, … <named list> ChIJW-T… <list>
 | 
			
		||||
#> 4 New York   <list [3]>        New York, NY, USA <named list> ChIJOwg… <list>
 | 
			
		||||
#> 5 Chicago    <list [4]>        Chicago, IL, USA  <named list> ChIJ7cv… <list>
 | 
			
		||||
#> 6 Arlington  <list [4]>        Arlington, TX, U… <named list> ChIJ05g… <list>
 | 
			
		||||
#> # … with 1 more row, and abbreviated variable name ¹address_components</pre>
 | 
			
		||||
#>   city       address_compone…¹ formatted_address geometry     place_id       
 | 
			
		||||
#>   <chr>      <list>            <chr>             <list>       <chr>          
 | 
			
		||||
#> 1 Houston    <list [4]>        Houston, TX, USA  <named list> ChIJAYWNSLS4QI…
 | 
			
		||||
#> 2 Washington <list [2]>        Washington, USA   <named list> ChIJ-bDD5__lhV…
 | 
			
		||||
#> 3 Washington <list [4]>        Washington, DC, … <named list> ChIJW-T2Wt7Gt4…
 | 
			
		||||
#> 4 New York   <list [3]>        New York, NY, USA <named list> ChIJOwg_06VPwo…
 | 
			
		||||
#> 5 Chicago    <list [4]>        Chicago, IL, USA  <named list> ChIJ7cv00DwsDo…
 | 
			
		||||
#> 6 Arlington  <list [4]>        Arlington, TX, U… <named list> ChIJ05gI5NJiTo…
 | 
			
		||||
#> # … with 1 more row, 1 more variable: types <list>, and abbreviated variable
 | 
			
		||||
#> #   name ¹address_components</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Now we can see why two cities got two results: Washington matched both Washington state and Washington, DC, and Arlington matched Arlington, Virginia and Arlington, Texas.</p>
 | 
			
		||||
<p>There are few different places we could go from here. We might want to determine the exact location of the match, which is stored in the <code>geometry</code> list-column:</p>
 | 
			
		||||
@@ -937,7 +918,7 @@ locations
 | 
			
		||||
<p>If these case studies have whetted your appetite for more real-life rectangling, you can see a few more examples in <code>vignette("rectangling", package = "tidyr")</code>.</p>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="exercises-1" data-type="sect2">
 | 
			
		||||
<section id="rectangling-exercises-1" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Exercises</h2>
 | 
			
		||||
<ol type="1"><li><p>Roughly estimate when <code>gh_repos</code> was created. Why can you only roughly estimate the date?</p></li>
 | 
			
		||||
@@ -965,7 +946,7 @@ Exercises</h2>
 | 
			
		||||
JSON</h1>
 | 
			
		||||
<p>All of the case studies in the previous section were sourced from wild-caught JSON. JSON is short for <strong>j</strong>ava<strong>s</strong>cript <strong>o</strong>bject <strong>n</strong>otation and is the way that most web APIs return data. It’s important to understand it because while JSON and R’s data types are pretty similar, there isn’t a perfect 1-to-1 mapping, so it’s good to understand a bit about JSON if things go wrong.</p>
 | 
			
		||||
 | 
			
		||||
<section id="data-types" data-type="sect2">
 | 
			
		||||
<section id="rectangling-data-types" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Data types</h2>
 | 
			
		||||
<p>JSON is a simple format designed to be easily read and written by machines, not humans. It has six key data types. Four of them are scalars:</p>
 | 
			
		||||
@@ -1083,7 +1064,7 @@ Translation challenges</h2>
 | 
			
		||||
<p>Since JSON doesn’t have any way to represent dates or date-times, they’re often stored as ISO8601 date times in strings, and you’ll need to use <code><a href="https://readr.tidyverse.org/reference/parse_datetime.html">readr::parse_date()</a></code> or <code><a href="https://readr.tidyverse.org/reference/parse_datetime.html">readr::parse_datetime()</a></code> to turn them into the correct data structure. Similarly, JSON’s rules for representing floating point numbers in JSON are a little imprecise, so you’ll also sometimes find numbers stored in strings. Apply <code><a href="https://readr.tidyverse.org/reference/parse_atomic.html">readr::parse_double()</a></code> as needed to the get correct variable type.</p>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="exercises-2" data-type="sect2">
 | 
			
		||||
<section id="rectangling-exercises-2" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Exercises</h2>
 | 
			
		||||
<ol type="1"><li>
 | 
			
		||||
@@ -1110,7 +1091,7 @@ df_row <- tibble(json = json_row)</pre>
 | 
			
		||||
</ol></section>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="summary" data-type="sect1">
 | 
			
		||||
<section id="rectangling-summary" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Summary</h1>
 | 
			
		||||
<p>In this chapter, you learned what lists are, how you can generate them from JSON files, and how turn them into rectangular data frames. Surprisingly we only need two new functions: <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">unnest_longer()</a></code> to put list elements into rows and <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code> to put list elements into columns. It doesn’t matter how deeply nested the list-column is, all you need to do is repeatedly call these two functions.</p>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user