More minor page count tweaks & fixes

And re-convert with latest htmlbook
This commit is contained in:
Hadley Wickham
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions

View File

@@ -1,12 +1,12 @@
<section data-type="chapter" id="chp-rectangling">
<h1><span id="sec-rectangling" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Hierarchical data</span></span></h1>
<section id="introduction" data-type="sect1">
<section id="rectangling-introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>In this chapter, youll learn the art of data <strong>rectangling</strong>, taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns. This is important because hierarchical data is surprisingly common, especially when working with data that comes from the web.</p>
<p>To learn about rectangling, youll need to first learn about lists, the data structure that makes hierarchical data possible. Then youll learn about two crucial tidyr functions: <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">tidyr::unnest_longer()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">tidyr::unnest_wider()</a></code>. Well then show you a few case studies, applying these simple functions again and again to solve real problems. Well finish off by talking about JSON, the most frequent source of hierarchical datasets and a common format for data exchange on the web.</p>
<section id="prerequisites" data-type="sect2">
<section id="rectangling-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<p>In this chapter, well use many functions from tidyr, a core member of the tidyverse. Well also use repurrrsive to provide some interesting datasets for rectangling practice, and well finish by using jsonlite to read JSON files into R lists.</p>
@@ -18,7 +18,7 @@ library(jsonlite)</pre>
</section>
</section>
<section id="lists" data-type="sect1">
<section id="rectangling-lists" data-type="sect1">
<h1>
Lists</h1>
<p>So far youve worked with data frames that contain simple vectors like integers, numbers, characters, date-times, and factors. These vectors are simple because theyre homogeneous: every element is of the same data type. If you want to store elements of different types in the same vector, youll need a <strong>list</strong>, which you create with <code><a href="https://rdrr.io/r/base/list.html">list()</a></code>:</p>
@@ -174,13 +174,19 @@ df
<p>Similarly, if you <code><a href="https://rdrr.io/r/utils/View.html">View()</a></code> a data frame in RStudio, youll get the standard tabular view, which doesnt allow you to selectively expand list columns. To explore those fields youll need to <code><a href="https://dplyr.tidyverse.org/reference/pull.html">pull()</a></code> and view, e.g. <code>df |&gt; pull(z) |&gt; View()</code>.</p>
<div data-type="note"><h1>
Base R
</h1><p>Its possible to put a list in a column of a <code>data.frame</code>, but its a lot fiddlier because <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> treats a list as a list of columns:</p><div class="cell">
</h1>
<p>Its possible to put a list in a column of a <code>data.frame</code>, but its a lot fiddlier because <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> treats a list as a list of columns:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">data.frame(x = list(1:3, 3:5))
#&gt; x.1.3 x.3.5
#&gt; 1 1 3
#&gt; 2 2 4
#&gt; 3 3 5</pre>
</div><p>You can force <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> to treat a list as a list of rows by wrapping it in list <code><a href="https://rdrr.io/r/base/AsIs.html">I()</a></code>, but the result doesnt print particularly well:</p><div class="cell">
</div>
<p>You can force <code><a href="https://rdrr.io/r/base/data.frame.html">data.frame()</a></code> to treat a list as a list of rows by wrapping it in list <code><a href="https://rdrr.io/r/base/AsIs.html">I()</a></code>, but the result doesnt print particularly well:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">data.frame(
x = I(list(1:2, 3:5)),
y = c("1, 2", "3, 4, 5")
@@ -188,7 +194,10 @@ Base R
#&gt; x y
#&gt; 1 1, 2 1, 2
#&gt; 2 3, 4, 5 3, 4, 5</pre>
</div><p>Its easier to use list-columns with tibbles because <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> treats lists like vectors and the print method has been designed with lists in mind.</p></div>
</div>
<p>Its easier to use list-columns with tibbles because <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> treats lists like vectors and the print method has been designed with lists in mind.</p>
</div>
</section>
</section>
@@ -220,7 +229,7 @@ df2 &lt;- tribble(
<section id="unnest_wider" data-type="sect2">
<h2>
<code>unnest_wider()</code>
unnest_wider()
</h2>
<p>When each row has the same number of elements with the same names, like <code>df1</code>, its natural to put each component into its own column with <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code>:</p>
<div class="cell">
@@ -260,7 +269,7 @@ df2 &lt;- tribble(
<section id="unnest_longer" data-type="sect2">
<h2>
<code>unnest_longer()</code>
unnest_longer()
</h2>
<p>When each row contains an unnamed list, its most natural to put each element into its own row with <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">unnest_longer()</a></code>:</p>
<div class="cell">
@@ -387,7 +396,7 @@ Inconsistent types</h2>
<p>Youll learn more about <code><a href="https://purrr.tidyverse.org/reference/map.html">map_lgl()</a></code> in <a href="#chp-iteration" data-type="xref">#chp-iteration</a>.</p>
</section>
<section id="other-functions" data-type="sect2">
<section id="rectangling-other-functions" data-type="sect2">
<h2>
Other functions</h2>
<p>tidyr has a few other useful rectangling functions that were not going to cover in this book:</p>
@@ -400,7 +409,7 @@ Other functions</h2>
</ul><p>These functions are good to know about as you might encounter them when reading other peoples code or tackling rarer rectangling challenges yourself.</p>
</section>
<section id="exercises" data-type="sect2">
<section id="rectangling-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li>
@@ -460,51 +469,26 @@ repos
unnest_longer(json) |&gt;
unnest_wider(json)
#&gt; # A tibble: 176 × 68
#&gt; id name full_name owner private html_url description fork
#&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;list&gt; &lt;lgl&gt; &lt;chr&gt; &lt;chr&gt; &lt;lgl&gt;
#&gt; 1 61160198 after gaborcsa &lt;named list&gt; FALSE https:/… Run Code i… FALSE
#&gt; 2 40500181 argufy gaborcsa… &lt;named list&gt; FALSE https:/… Declarativ… FALSE
#&gt; 3 36442442 ask gaborcsa &lt;named list&gt; FALSE https:/… Friendly C… FALSE
#&gt; 4 34924886 baseimp gaborcsa… &lt;named list&gt; FALSE https:/… Do we get … FALSE
#&gt; 5 61620661 citest gaborcsa… &lt;named list&gt; FALSE https:/… Test R pac… TRUE
#&gt; 6 33907457 clisymb gaborcsa… &lt;named list&gt; FALSE https:/… Unicode sy… FALSE
#&gt; # … with 170 more rows, and 60 more variables: url &lt;chr&gt;, forks_url &lt;chr&gt;,
#&gt; # keys_url &lt;chr&gt;, collaborators_url &lt;chr&gt;, teams_url &lt;chr&gt;,
#&gt; # hooks_url &lt;chr&gt;, issue_events_url &lt;chr&gt;, events_url &lt;chr&gt;,
#&gt; # assignees_url &lt;chr&gt;, branches_url &lt;chr&gt;, tags_url &lt;chr&gt;,
#&gt; # blobs_url &lt;chr&gt;, git_tags_url &lt;chr&gt;, git_refs_url &lt;chr&gt;,
#&gt; # trees_url &lt;chr&gt;, statuses_url &lt;chr&gt;, languages_url &lt;chr&gt;,
#&gt; # stargazers_url &lt;chr&gt;, contributors_url &lt;chr&gt;, subscribers_url &lt;chr&gt;, …</pre>
#&gt; id name full_name owner private html_url
#&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;list&gt; &lt;lgl&gt; &lt;chr&gt;
#&gt; 1 61160198 after gaborcsardi/after &lt;named list&gt; FALSE https://github…
#&gt; 2 40500181 argufy gaborcsardi/argu&lt;named list&gt; FALSE https://github…
#&gt; 3 36442442 ask gaborcsardi/ask &lt;named list&gt; FALSE https://github…
#&gt; 4 34924886 baseimports gaborcsardi/base&lt;named list&gt; FALSE https://github…
#&gt; 5 61620661 citest gaborcsardi/cite&lt;named list&gt; FALSE https://github…
#&gt; 6 33907457 clisymbols gaborcsardi/clis&lt;named list&gt; FALSE https://github…
#&gt; # … with 170 more rows, and 62 more variables: description &lt;chr&gt;,
#&gt; # fork &lt;lgl&gt;, url &lt;chr&gt;, forks_url &lt;chr&gt;, keys_url &lt;chr&gt;,</pre>
</div>
<p>This has worked but the result is a little overwhelming: there are so many columns that tibble doesnt even print all of them! We can see them all with <code><a href="https://rdrr.io/r/base/names.html">names()</a></code>:</p>
<p>This has worked but the result is a little overwhelming: there are so many columns that tibble doesnt even print all of them! We can see them all with <code><a href="https://rdrr.io/r/base/names.html">names()</a></code>; and here we look at the first 10:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">repos |&gt;
unnest_longer(json) |&gt;
unnest_wider(json) |&gt;
names()
#&gt; [1] "id" "name" "full_name"
#&gt; [4] "owner" "private" "html_url"
#&gt; [7] "description" "fork" "url"
#&gt; [10] "forks_url" "keys_url" "collaborators_url"
#&gt; [13] "teams_url" "hooks_url" "issue_events_url"
#&gt; [16] "events_url" "assignees_url" "branches_url"
#&gt; [19] "tags_url" "blobs_url" "git_tags_url"
#&gt; [22] "git_refs_url" "trees_url" "statuses_url"
#&gt; [25] "languages_url" "stargazers_url" "contributors_url"
#&gt; [28] "subscribers_url" "subscription_url" "commits_url"
#&gt; [31] "git_commits_url" "comments_url" "issue_comment_url"
#&gt; [34] "contents_url" "compare_url" "merges_url"
#&gt; [37] "archive_url" "downloads_url" "issues_url"
#&gt; [40] "pulls_url" "milestones_url" "notifications_url"
#&gt; [43] "labels_url" "releases_url" "deployments_url"
#&gt; [46] "created_at" "updated_at" "pushed_at"
#&gt; [49] "git_url" "ssh_url" "clone_url"
#&gt; [52] "svn_url" "homepage" "size"
#&gt; [55] "stargazers_count" "watchers_count" "language"
#&gt; [58] "has_issues" "has_downloads" "has_wiki"
#&gt; [61] "has_pages" "forks_count" "mirror_url"
#&gt; [64] "open_issues_count" "forks" "open_issues"
#&gt; [67] "watchers" "default_branch"</pre>
names() |&gt;
head(10)
#&gt; [1] "id" "name" "full_name" "owner" "private"
#&gt; [6] "html_url" "description" "fork" "url" "forks_url"</pre>
</div>
<p>Lets select a few that look interesting:</p>
<div class="cell">
@@ -523,7 +507,7 @@ repos
#&gt; 6 33907457 gaborcsardi/clisymbols &lt;named list [17]&gt; Unicode symbols for CLI…
#&gt; # … with 170 more rows</pre>
</div>
<p>You can use this to work back to understand how <code>gh_repos</code> was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.</p>
<p>You can use this to work back to understand how <code>gh_repos</code> was structured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.</p>
<p><code>owner</code> is another list-column, and since it contains a named list, we can use <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code> to get at the values:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">repos |&gt;
@@ -531,11 +515,13 @@ repos
unnest_wider(json) |&gt;
select(id, full_name, owner, description) |&gt;
unnest_wider(owner)
#&gt; Error in `unpack()` at ]8;line = 121:col = 2;file:///Users/hadleywickham/Documents/tidy-data/tidyr/R/unnest-wider.Rtidyr/R/unnest-wider.R:121:2]8;;:
#&gt; ! Names must be unique.
#&gt; Error in `unnest_wider()`:
#&gt; ! Can't duplicate names between the affected columns and the original
#&gt; data.
#&gt; ✖ These names are duplicated:
#&gt; * "id" at locations 1 and 4.
#&gt; Use argument `names_repair` to specify repair strategy.</pre>
#&gt; `id`, from `owner`.
#&gt; Use `names_sep` to disambiguate using the column name.
#&gt; Or use `names_repair` to specify a repair strategy.</pre>
</div>
<!--# TODO: https://github.com/tidyverse/tidyr/issues/1390 -->
<p>Uh oh, this list column also contains an <code>id</code> column and we cant have two <code>id</code> columns in the same data frame. Rather than following the advice to use <code>names_repair</code> (which would also work), well instead use <code>names_sep</code>:</p>
@@ -546,21 +532,16 @@ repos
select(id, full_name, owner, description) |&gt;
unnest_wider(owner, names_sep = "_")
#&gt; # A tibble: 176 × 20
#&gt; id full_name owner_login owner_id owner_avatar_url owner_gravatar_id
#&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 61160198 gaborcsar gaborcsardi 660288 https://avatars… ""
#&gt; 2 40500181 gaborcsar gaborcsardi 660288 https://avatars… ""
#&gt; 3 36442442 gaborcsar gaborcsardi 660288 https://avatars… ""
#&gt; 4 34924886 gaborcsar gaborcsardi 660288 https://avatars… ""
#&gt; 5 61620661 gaborcsar gaborcsardi 660288 https://avatars… ""
#&gt; 6 33907457 gaborcsar gaborcsardi 660288 https://avatars… ""
#&gt; # … with 170 more rows, and 14 more variables: owner_url &lt;chr&gt;,
#&gt; # owner_html_url &lt;chr&gt;, owner_followers_url &lt;chr&gt;,
#&gt; # owner_following_url &lt;chr&gt;, owner_gists_url &lt;chr&gt;,
#&gt; # owner_starred_url &lt;chr&gt;, owner_subscriptions_url &lt;chr&gt;,
#&gt; # owner_organizations_url &lt;chr&gt;, owner_repos_url &lt;chr&gt;,
#&gt; # owner_events_url &lt;chr&gt;, owner_received_events_url &lt;chr&gt;,
#&gt; # owner_type &lt;chr&gt;, owner_site_admin &lt;lgl&gt;, description &lt;chr&gt;</pre>
#&gt; id full_name owner_login owner_id owner_avatar_url
#&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt;
#&gt; 1 61160198 gaborcsardi/after gaborcsardi 660288 https://avatars.gith
#&gt; 2 40500181 gaborcsardi/argufy gaborcsardi 660288 https://avatars.gith
#&gt; 3 36442442 gaborcsardi/ask gaborcsardi 660288 https://avatars.gith
#&gt; 4 34924886 gaborcsardi/baseimports gaborcsardi 660288 https://avatars.gith
#&gt; 5 61620661 gaborcsardi/citest gaborcsardi 660288 https://avatars.gith
#&gt; 6 33907457 gaborcsardi/clisymbols gaborcsardi 660288 https://avatars.gith
#&gt; # … with 170 more rows, and 15 more variables: owner_gravatar_id &lt;chr&gt;,
#&gt; # owner_url &lt;chr&gt;, owner_html_url &lt;chr&gt;, owner_followers_url &lt;chr&gt;,</pre>
</div>
<p>This gives another wide dataset, but you can see that <code>owner</code> appears to contain a lot of additional data about the person who “owns” the repository.</p>
</section>
@@ -588,17 +569,16 @@ chars
<pre data-type="programlisting" data-code-language="r">chars |&gt;
unnest_wider(json)
#&gt; # A tibble: 30 × 18
#&gt; url id name gender culture born died alive titles aliases father
#&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;lgl&gt; &lt;list&gt; &lt;list&gt; &lt;chr&gt;
#&gt; 1 https:/… 1022 Theo Male "Ironb "In … "" TRUE &lt;chr&gt; &lt;chr&gt; ""
#&gt; 2 https:/… 1052 Tyri… Male "" "In … "" TRUE &lt;chr&gt; &lt;chr&gt; ""
#&gt; 3 https:/… 1074 Vict… Male "Ironb "In … "" TRUE &lt;chr&gt; &lt;chr&gt; ""
#&gt; 4 https:/… 1109 Will Male "" "" "In … FALSE &lt;chr&gt; &lt;chr&gt; ""
#&gt; 5 https:/… 1166 Areo Male "Norvo "In … "" TRUE &lt;chr&gt; &lt;chr&gt; ""
#&gt; 6 https:/… 1267 Chett Male "" "At … "In … FALSE &lt;chr&gt; &lt;chr&gt; ""
#&gt; # … with 24 more rows, and 7 more variables: mother &lt;chr&gt;, spouse &lt;chr&gt;,
#&gt; # allegiances &lt;list&gt;, books &lt;list&gt;, povBooks &lt;list&gt;, tvSeries &lt;list&gt;,
#&gt; # playedBy &lt;list&gt;</pre>
#&gt; url id name gender culture born
#&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 https://www.anapio… 1022 Theon Greyjoy Male "Ironborn" "In 278 AC or …
#&gt; 2 https://www.anapio… 1052 Tyrion Lannist… Male "" "In 273 AC, at…
#&gt; 3 https://www.anapio… 1074 Victarion Grey… Male "Ironborn" "In 268 AC or …
#&gt; 4 https://www.anapio… 1109 Will Male "" ""
#&gt; 5 https://www.anapio… 1166 Areo Hotah Male "Norvoshi" "In 257 AC or …
#&gt; 6 https://www.anapio… 1267 Chett Male "" "At Hag's Mire"
#&gt; # … with 24 more rows, and 12 more variables: died &lt;chr&gt;, alive &lt;lgl&gt;,
#&gt; # titles &lt;list&gt;, aliases &lt;list&gt;, father &lt;chr&gt;, mother &lt;chr&gt;,</pre>
</div>
<p>And selecting a few columns to make it easier to read:</p>
<div class="cell">
@@ -607,15 +587,15 @@ chars
select(id, name, gender, culture, born, died, alive)
characters
#&gt; # A tibble: 30 × 7
#&gt; id name gender culture born died alive
#&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;lgl&gt;
#&gt; 1 1022 Theon Greyjoy Male "Ironborn" "In 278 AC or 279 AC… "" TRUE
#&gt; 2 1052 Tyrion Lannister Male "" "In 273 AC, at Caste… "" TRUE
#&gt; 3 1074 Victarion Greyjoy Male "Ironborn" "In 268 AC or before… "" TRUE
#&gt; 4 1109 Will Male "" "" "In … FALSE
#&gt; 5 1166 Areo Hotah Male "Norvoshi" "In 257 AC or before… "" TRUE
#&gt; 6 1267 Chett Male "" "At Hag's Mire" "In … FALSE
#&gt; # … with 24 more rows</pre>
#&gt; id name gender culture born died
#&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 1022 Theon Greyjoy Male "Ironborn" "In 278 AC or 27… ""
#&gt; 2 1052 Tyrion Lannister Male "" "In 273 AC, at C… ""
#&gt; 3 1074 Victarion Greyjoy Male "Ironborn" "In 268 AC or be… ""
#&gt; 4 1109 Will Male "" "" "In 297 AC, at…
#&gt; 5 1166 Areo Hotah Male "Norvoshi" "In 257 AC or be… ""
#&gt; 6 1267 Chett Male "" "At Hag's Mire" "In 299 AC, at…
#&gt; # … with 24 more rows, and 1 more variable: alive &lt;lgl&gt;</pre>
</div>
<p>There are also many list-columns:</p>
<div class="cell">
@@ -828,15 +808,16 @@ Deeply nested</h2>
unnest_wider(results)
locations
#&gt; # A tibble: 7 × 6
#&gt; city address_compone…¹ formatted_address geometry place_id types
#&gt; &lt;chr&gt; &lt;list&gt; &lt;chr&gt; &lt;list&gt; &lt;chr&gt; &lt;list&gt;
#&gt; 1 Houston &lt;list [4]&gt; Houston, TX, USA &lt;named list&gt; ChIJAYW&lt;list&gt;
#&gt; 2 Washington &lt;list [2]&gt; Washington, USA &lt;named list&gt; ChIJ-bD&lt;list&gt;
#&gt; 3 Washington &lt;list [4]&gt; Washington, DC, … &lt;named list&gt; ChIJW-T&lt;list&gt;
#&gt; 4 New York &lt;list [3]&gt; New York, NY, USA &lt;named list&gt; ChIJOwg&lt;list&gt;
#&gt; 5 Chicago &lt;list [4]&gt; Chicago, IL, USA &lt;named list&gt; ChIJ7cv&lt;list&gt;
#&gt; 6 Arlington &lt;list [4]&gt; Arlington, TX, U… &lt;named list&gt; ChIJ05g&lt;list&gt;
#&gt; # … with 1 more row, and abbreviated variable name ¹address_components</pre>
#&gt; city address_compone…¹ formatted_address geometry place_id
#&gt; &lt;chr&gt; &lt;list&gt; &lt;chr&gt; &lt;list&gt; &lt;chr&gt;
#&gt; 1 Houston &lt;list [4]&gt; Houston, TX, USA &lt;named list&gt; ChIJAYWNSLS4QI…
#&gt; 2 Washington &lt;list [2]&gt; Washington, USA &lt;named list&gt; ChIJ-bDD5__lhV…
#&gt; 3 Washington &lt;list [4]&gt; Washington, DC, … &lt;named list&gt; ChIJW-T2Wt7Gt4…
#&gt; 4 New York &lt;list [3]&gt; New York, NY, USA &lt;named list&gt; ChIJOwg_06VPwo…
#&gt; 5 Chicago &lt;list [4]&gt; Chicago, IL, USA &lt;named list&gt; ChIJ7cv00DwsDo…
#&gt; 6 Arlington &lt;list [4]&gt; Arlington, TX, U… &lt;named list&gt; ChIJ05gI5NJiTo…
#&gt; # … with 1 more row, 1 more variable: types &lt;list&gt;, and abbreviated variable
#&gt; # name ¹address_components</pre>
</div>
<p>Now we can see why two cities got two results: Washington matched both Washington state and Washington, DC, and Arlington matched Arlington, Virginia and Arlington, Texas.</p>
<p>There are few different places we could go from here. We might want to determine the exact location of the match, which is stored in the <code>geometry</code> list-column:</p>
@@ -937,7 +918,7 @@ locations
<p>If these case studies have whetted your appetite for more real-life rectangling, you can see a few more examples in <code>vignette("rectangling", package = "tidyr")</code>.</p>
</section>
<section id="exercises-1" data-type="sect2">
<section id="rectangling-exercises-1" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>Roughly estimate when <code>gh_repos</code> was created. Why can you only roughly estimate the date?</p></li>
@@ -965,7 +946,7 @@ Exercises</h2>
JSON</h1>
<p>All of the case studies in the previous section were sourced from wild-caught JSON. JSON is short for <strong>j</strong>ava<strong>s</strong>cript <strong>o</strong>bject <strong>n</strong>otation and is the way that most web APIs return data. Its important to understand it because while JSON and Rs data types are pretty similar, there isnt a perfect 1-to-1 mapping, so its good to understand a bit about JSON if things go wrong.</p>
<section id="data-types" data-type="sect2">
<section id="rectangling-data-types" data-type="sect2">
<h2>
Data types</h2>
<p>JSON is a simple format designed to be easily read and written by machines, not humans. It has six key data types. Four of them are scalars:</p>
@@ -1083,7 +1064,7 @@ Translation challenges</h2>
<p>Since JSON doesnt have any way to represent dates or date-times, theyre often stored as ISO8601 date times in strings, and youll need to use <code><a href="https://readr.tidyverse.org/reference/parse_datetime.html">readr::parse_date()</a></code> or <code><a href="https://readr.tidyverse.org/reference/parse_datetime.html">readr::parse_datetime()</a></code> to turn them into the correct data structure. Similarly, JSONs rules for representing floating point numbers in JSON are a little imprecise, so youll also sometimes find numbers stored in strings. Apply <code><a href="https://readr.tidyverse.org/reference/parse_atomic.html">readr::parse_double()</a></code> as needed to the get correct variable type.</p>
</section>
<section id="exercises-2" data-type="sect2">
<section id="rectangling-exercises-2" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li>
@@ -1110,7 +1091,7 @@ df_row &lt;- tibble(json = json_row)</pre>
</ol></section>
</section>
<section id="summary" data-type="sect1">
<section id="rectangling-summary" data-type="sect1">
<h1>
Summary</h1>
<p>In this chapter, you learned what lists are, how you can generate them from JSON files, and how turn them into rectangular data frames. Surprisingly we only need two new functions: <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">unnest_longer()</a></code> to put list elements into rows and <code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code> to put list elements into columns. It doesnt matter how deeply nested the list-column is, all you need to do is repeatedly call these two functions.</p>