More minor page count tweaks & fixes
And re-convert with latest htmlbook
This commit is contained in:
		@@ -1,6 +1,6 @@
 | 
			
		||||
<section data-type="chapter" id="chp-iteration">
 | 
			
		||||
<h1><span id="sec-iteration" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Iteration</span></span></h1>
 | 
			
		||||
<section id="introduction" data-type="sect1">
 | 
			
		||||
<section id="iteration-introduction" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Introduction</h1>
 | 
			
		||||
<p>In this chapter, you’ll learn tools for iteration, repeatedly performing the same action on different objects. Iteration in R generally tends to look rather different from other programming languages because so much of it is implicit and we get it for free. For example, if you want to double a numeric vector <code>x</code> in R, you can just write <code>2 * x</code>. In most other languages, you’d need to explicitly double each element of <code>x</code> using some sort of for loop.</p>
 | 
			
		||||
@@ -13,17 +13,19 @@ Introduction</h1>
 | 
			
		||||
<code><a href="https://tidyr.tidyverse.org/reference/unnest_wider.html">unnest_wider()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/unnest_longer.html">unnest_longer()</a></code> create new rows and columns for each element of a list-column.</li>
 | 
			
		||||
</ul><p>Now it’s time to learn some more general tools, often called <strong>functional programming</strong> tools because they are built around functions that take other functions as inputs. Learning functional programming can easily veer into the abstract, but in this chapter we’ll keep things concrete by focusing on three common tasks: modifying multiple columns, reading multiple files, and saving multiple objects.</p>
 | 
			
		||||
 | 
			
		||||
<section id="prerequisites" data-type="sect2">
 | 
			
		||||
<section id="iteration-prerequisites" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Prerequisites</h2>
 | 
			
		||||
<div data-type="important"><div class="callout-body d-flex">
 | 
			
		||||
<div data-type="important">
 | 
			
		||||
<div class="callout-body d-flex">
 | 
			
		||||
<div class="callout-icon-container">
 | 
			
		||||
<i class="callout-icon"/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
</div>
 | 
			
		||||
<p>This chapter relies on features only found in dplyr 1.1.0, which is still in development. If you want to live life on the edge you can get the dev version with <code>devtools::install_github(c( "tidyverse/dplyr"))</code>.</p>
 | 
			
		||||
 | 
			
		||||
<p>This chapter relies on features only found in purrr 1.0.0 and dplyr 1.1.0, which are still in development. If you want to live life on the edge you can get the dev version with <code>devtools::install_github(c("tidyverse/purrr", "tidyverse/dplyr"))</code>.</p></div>
 | 
			
		||||
</div>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
<p>In this chapter, we’ll focus on tools provided by dplyr and purrr, both core members of the tidyverse. You’ve seen dplyr before, but <a href="http://purrr.tidyverse.org/">purrr</a> is new. We’re just going to use a couple of purrr functions in this chapter, but it’s a great package to explore as you improve your programming skills.</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -73,7 +75,7 @@ Modifying multiple columns</h1>
 | 
			
		||||
 | 
			
		||||
<section id="selecting-columns-with-.cols" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Selecting columns with<code>.cols</code>
 | 
			
		||||
Selecting columns with .cols
 | 
			
		||||
</h2>
 | 
			
		||||
<p>The first argument to <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>, <code>.cols</code>, selects the columns to transform. This uses the same specifications as <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <a href="#sec-select" data-type="xref">#sec-select</a>, so you can use functions like <code><a href="https://tidyselect.r-lib.org/reference/starts_with.html">starts_with()</a></code> and <code><a href="https://tidyselect.r-lib.org/reference/starts_with.html">ends_with()</a></code> to select columns based on their name.</p>
 | 
			
		||||
<p>There are two additional selection techniques that are particularly useful for <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>: <code><a href="https://tidyselect.r-lib.org/reference/everything.html">everything()</a></code> and <code><a href="https://tidyselect.r-lib.org/reference/where.html">where()</a></code>. <code><a href="https://tidyselect.r-lib.org/reference/everything.html">everything()</a></code> is straightforward: it selects every (non-grouping) column:</p>
 | 
			
		||||
@@ -316,12 +318,10 @@ df_miss |> filter(if_all(a:d, is.na))
 | 
			
		||||
 | 
			
		||||
<section id="across-in-functions" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
<code>across()</code> in functions</h2>
 | 
			
		||||
across() in functions</h2>
 | 
			
		||||
<p><code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code> is particularly useful to program with because it allows you to operate on multiple columns. For example, <a href="https://twitter.com/_wurli/status/1571836746899283969">Jacob Scott</a> uses this little helper which wraps a bunch of lubridate function to expand all date columns into year, month, and day columns:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">library(lubridate)
 | 
			
		||||
 | 
			
		||||
expand_dates <- function(df) {
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">expand_dates <- function(df) {
 | 
			
		||||
  df |> 
 | 
			
		||||
    mutate(
 | 
			
		||||
      across(where(is.Date), list(year = year, month = month, day = mday))
 | 
			
		||||
@@ -382,7 +382,7 @@ diamonds |>
 | 
			
		||||
 | 
			
		||||
<section id="vs-pivot_longer" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Vs<code>pivot_longer()</code>
 | 
			
		||||
Vs pivot_longer()
 | 
			
		||||
</h2>
 | 
			
		||||
<p>Before we go on, it’s worth pointing out an interesting connection between <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/pivot_longer.html">pivot_longer()</a></code> (<a href="#sec-pivoting" data-type="xref">#sec-pivoting</a>). In many cases, you perform the same calculations by first pivoting the data and then performing the operations by group rather than by column. For example, take this multi-function summary:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -472,7 +472,7 @@ df_long |>
 | 
			
		||||
<p>If needed, you could <code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code> this back to the original form.</p>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="exercises" data-type="sect2">
 | 
			
		||||
<section id="iteration-exercises" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Exercises</h2>
 | 
			
		||||
<ol type="1"><li><p>Compute the number of unique values in each column of <code><a href="https://allisonhorst.github.io/palmerpenguins/reference/penguins.html">palmerpenguins::penguins</a></code>.</p></li>
 | 
			
		||||
@@ -535,7 +535,7 @@ paths
 | 
			
		||||
</div>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="lists" data-type="sect2">
 | 
			
		||||
<section id="iteration-lists" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
Lists</h2>
 | 
			
		||||
<p>Now that we have these 12 paths, we could call <code>read_excel()</code> 12 times to get 12 data frames:</p>
 | 
			
		||||
@@ -575,7 +575,7 @@ gapminder_2007 <- readxl::read_excel("data/gapminder/2007.xlsx")</pre>
 | 
			
		||||
 | 
			
		||||
<section id="purrrmap-and-list_rbind" data-type="sect2">
 | 
			
		||||
<h2>
 | 
			
		||||
<code>purrr::map()</code> and <code>list_rbind()</code>
 | 
			
		||||
purrr::map() and list_rbind()
 | 
			
		||||
</h2>
 | 
			
		||||
<p>The code to collect those data frames in a list “by hand” is basically just as tedious to type as code that reads the files one-by-one. Happily, we can use <code><a href="https://purrr.tidyverse.org/reference/map.html">purrr::map()</a></code> to make even better use of our <code>paths</code> vector. <code><a href="https://purrr.tidyverse.org/reference/map.html">map()</a></code> is similar to<code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>, but instead of doing something to each column in a data frame, it does something to each element of a vector.<code>map(x, f)</code> is shorthand for:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
@@ -919,7 +919,7 @@ DBI::dbCreateTable(con, "gapminder", template)</pre>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">con |> tbl("gapminder")
 | 
			
		||||
#> # Source:   table<gapminder> [0 x 6]
 | 
			
		||||
#> # Database: DuckDB 0.6.1 [root@Darwin 22.1.0:R 4.2.1/:memory:]
 | 
			
		||||
#> # Database: DuckDB 0.6.1 [root@Darwin 22.2.0:R 4.2.1/:memory:]
 | 
			
		||||
#> # … with 6 variables: country <chr>, continent <chr>, lifeExp <dbl>,
 | 
			
		||||
#> #   pop <dbl>, gdpPercap <dbl>, year <dbl></pre>
 | 
			
		||||
</div>
 | 
			
		||||
@@ -932,7 +932,7 @@ DBI::dbCreateTable(con, "gapminder", template)</pre>
 | 
			
		||||
  DBI::dbAppendTable(con, "gapminder", df)
 | 
			
		||||
}</pre>
 | 
			
		||||
</div>
 | 
			
		||||
<p>Now we need to call <code>append_csv()</code> once for each element of <code>paths</code>. That’s certainly possible with <code><a href="https://purrr.tidyverse.org/reference/map.html">map()</a></code>:</p>
 | 
			
		||||
<p>Now we need to call <code>append_file()</code> once for each element of <code>paths</code>. That’s certainly possible with <code><a href="https://purrr.tidyverse.org/reference/map.html">map()</a></code>:</p>
 | 
			
		||||
<div class="cell">
 | 
			
		||||
<pre data-type="programlisting" data-code-language="r">paths |> map(append_file)</pre>
 | 
			
		||||
</div>
 | 
			
		||||
@@ -946,7 +946,7 @@ DBI::dbCreateTable(con, "gapminder", template)</pre>
 | 
			
		||||
  tbl("gapminder") |> 
 | 
			
		||||
  count(year)
 | 
			
		||||
#> # Source:   SQL [?? x 2]
 | 
			
		||||
#> # Database: DuckDB 0.6.1 [root@Darwin 22.1.0:R 4.2.1/:memory:]
 | 
			
		||||
#> # Database: DuckDB 0.6.1 [root@Darwin 22.2.0:R 4.2.1/:memory:]
 | 
			
		||||
#>    year     n
 | 
			
		||||
#>   <dbl> <dbl>
 | 
			
		||||
#> 1  1952   142
 | 
			
		||||
@@ -1071,7 +1071,7 @@ ggsave(by_clarity$path[[8]], by_clarity$plot[[8]], width = 6, height = 6)</pre>
 | 
			
		||||
</section>
 | 
			
		||||
</section>
 | 
			
		||||
 | 
			
		||||
<section id="summary" data-type="sect1">
 | 
			
		||||
<section id="iteration-summary" data-type="sect1">
 | 
			
		||||
<h1>
 | 
			
		||||
Summary</h1>
 | 
			
		||||
<p>In this chapter, you’ve seen how to use explicit iteration to solve three problems that come up frequently when doing data science: manipulating multiple columns, reading multiple files, and saving multiple outputs. But in general, iteration is a super power: if you know the right iteration technique, you can easily go from fixing one problem to fixing all the problems. Once you’ve mastered the techniques in this chapter, we highly recommend learning more by reading the <a href="https://adv-r.hadley.nz/functionals.html">Functionals chapter</a> of <em>Advanced R</em> and consulting the <a href="https://purrr.tidyverse.org">purrr website</a>.</p>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user