More minor page count tweaks & fixes
And re-convert with latest htmlbook
This commit is contained in:
@@ -1,24 +1,15 @@
|
||||
<section data-type="chapter" id="chp-strings">
|
||||
<h1><span id="sec-strings" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Strings</span></span></h1>
|
||||
<section id="introduction" data-type="sect1">
|
||||
<section id="strings-introduction" data-type="sect1">
|
||||
<h1>
|
||||
Introduction</h1>
|
||||
<p>So far, you’ve used a bunch of strings without learning much about the details. Now it’s time to dive into them, learn what makes strings tick, and master some of the powerful string manipulation tools you have at your disposal.</p>
|
||||
<p>We’ll begin with the details of creating strings and character vectors. You’ll then dive into creating strings from data, then the opposite; extracting strings from data. We’ll then discuss tools that work with individual letters. The chapter finishes with functions that work with individual letters and a brief discussion of where your expectations from English might steer you wrong when working with other languages.</p>
|
||||
<p>We’ll keep working with strings in the next chapter, where you’ll learn more about the power of regular expressions.</p>
|
||||
|
||||
<section id="prerequisites" data-type="sect2">
|
||||
<section id="strings-prerequisites" data-type="sect2">
|
||||
<h2>
|
||||
Prerequisites</h2>
|
||||
<div data-type="important"><div class="callout-body d-flex">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"/>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<p>This chapter relies on features only found in tidyr 1.3.0, which is still in development. If you want to live on the edge, you can get the dev versions with <code>devtools::install_github("tidyverse/tidyr")</code>.</p></div>
|
||||
|
||||
<p>In this chapter, we’ll use functions from the stringr package, which is part of the core tidyverse. We’ll also use the babynames data since it provides some fun strings to manipulate.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">library(tidyverse)
|
||||
@@ -113,7 +104,7 @@ str_view(x)
|
||||
<p>Note that <code><a href="https://stringr.tidyverse.org/reference/str_view.html">str_view()</a></code> uses a blue background for tabs to make them easier to spot. One of the challenges of working with text is that there’s a variety of ways that white space can end up in the text, so this background helps you recognize that something strange is going on.</p>
|
||||
</section>
|
||||
|
||||
<section id="exercises" data-type="sect2">
|
||||
<section id="strings-exercises" data-type="sect2">
|
||||
<h2>
|
||||
Exercises</h2>
|
||||
<ol type="1"><li>
|
||||
@@ -138,7 +129,7 @@ Creating many strings from data</h1>
|
||||
|
||||
<section id="str_c" data-type="sect2">
|
||||
<h2>
|
||||
<code>str_c()</code>
|
||||
str_c()
|
||||
</h2>
|
||||
<p><code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code> takes any number of vectors as arguments and returns a character vector:</p>
|
||||
<div class="cell">
|
||||
@@ -151,16 +142,14 @@ str_c("Hello ", c("John", "Susan"))
|
||||
</div>
|
||||
<p><code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code> is very similar to the base <code><a href="https://rdrr.io/r/base/paste.html">paste0()</a></code>, but is designed to be used with <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> by obeying the usual tidyverse rules for recycling and propagating missing values:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">set.seed(1410)
|
||||
df <- tibble(name = c(wakefield::name(3), NA))
|
||||
<pre data-type="programlisting" data-code-language="r">df <- tibble(name = c("Flora", "David", "Terra"))
|
||||
df |> mutate(greeting = str_c("Hi ", name, "!"))
|
||||
#> # A tibble: 4 × 2
|
||||
#> name greeting
|
||||
#> <chr> <chr>
|
||||
#> 1 Ilena Hi Ilena!
|
||||
#> 2 Sacramento Hi Sacramento!
|
||||
#> 3 Graylon Hi Graylon!
|
||||
#> 4 <NA> <NA></pre>
|
||||
#> # A tibble: 3 × 2
|
||||
#> name greeting
|
||||
#> <chr> <chr>
|
||||
#> 1 Flora Hi Flora!
|
||||
#> 2 David Hi David!
|
||||
#> 3 Terra Hi Terra!</pre>
|
||||
</div>
|
||||
<p>If you want missing values to display in another way, use <code><a href="https://dplyr.tidyverse.org/reference/coalesce.html">coalesce()</a></code> to replace them. Depending on what you want, you might use it either inside or outside of <code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code>:</p>
|
||||
<div class="cell">
|
||||
@@ -169,48 +158,45 @@ df |> mutate(greeting = str_c("Hi ", name, "!"))
|
||||
greeting1 = str_c("Hi ", coalesce(name, "you"), "!"),
|
||||
greeting2 = coalesce(str_c("Hi ", name, "!"), "Hi!")
|
||||
)
|
||||
#> # A tibble: 4 × 3
|
||||
#> name greeting1 greeting2
|
||||
#> <chr> <chr> <chr>
|
||||
#> 1 Ilena Hi Ilena! Hi Ilena!
|
||||
#> 2 Sacramento Hi Sacramento! Hi Sacramento!
|
||||
#> 3 Graylon Hi Graylon! Hi Graylon!
|
||||
#> 4 <NA> Hi you! Hi!</pre>
|
||||
#> # A tibble: 3 × 3
|
||||
#> name greeting1 greeting2
|
||||
#> <chr> <chr> <chr>
|
||||
#> 1 Flora Hi Flora! Hi Flora!
|
||||
#> 2 David Hi David! Hi David!
|
||||
#> 3 Terra Hi Terra! Hi Terra!</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="sec-glue" data-type="sect2">
|
||||
<h2>
|
||||
<code>str_glue()</code>
|
||||
str_glue()
|
||||
</h2>
|
||||
<p>If you are mixing many fixed and variable strings with <code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code>, you’ll notice that you type a lot of <code>"</code>s, making it hard to see the overall goal of the code. An alternative approach is provided by the <a href="https://glue.tidyverse.org">glue package</a> via <code><a href="https://stringr.tidyverse.org/reference/str_glue.html">str_glue()</a></code><span data-type="footnote">If you’re not using stringr, you can also access it directly with <code><a href="https://glue.tidyverse.org/reference/glue.html">glue::glue()</a></code>.</span>. You give it a single string that has a special feature: anything inside <code><a href="https://rdrr.io/r/base/Paren.html">{}</a></code> will be evaluated like it’s outside of the quotes:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">df |> mutate(greeting = str_glue("Hi {name}!"))
|
||||
#> # A tibble: 4 × 2
|
||||
#> name greeting
|
||||
#> <chr> <glue>
|
||||
#> 1 Ilena Hi Ilena!
|
||||
#> 2 Sacramento Hi Sacramento!
|
||||
#> 3 Graylon Hi Graylon!
|
||||
#> 4 <NA> Hi NA!</pre>
|
||||
#> # A tibble: 3 × 2
|
||||
#> name greeting
|
||||
#> <chr> <glue>
|
||||
#> 1 Flora Hi Flora!
|
||||
#> 2 David Hi David!
|
||||
#> 3 Terra Hi Terra!</pre>
|
||||
</div>
|
||||
<p>As you can see, <code><a href="https://stringr.tidyverse.org/reference/str_glue.html">str_glue()</a></code> currently converts missing values to the string <code>"NA"</code> unfortunately making it inconsistent with <code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code>.</p>
|
||||
<p>You also might wonder what happens if you need to include a regular <code>{</code> or <code>}</code> in your string. You’re on the right track if you guess you’ll need to escape it somehow. The trick is that glue uses a slightly different escaping technique; instead of prefixing with special character like <code>\</code>, you double up the special characters:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">df |> mutate(greeting = str_glue("{{Hi {name}!}}"))
|
||||
#> # A tibble: 4 × 2
|
||||
#> name greeting
|
||||
#> <chr> <glue>
|
||||
#> 1 Ilena {Hi Ilena!}
|
||||
#> 2 Sacramento {Hi Sacramento!}
|
||||
#> 3 Graylon {Hi Graylon!}
|
||||
#> 4 <NA> {Hi NA!}</pre>
|
||||
#> # A tibble: 3 × 2
|
||||
#> name greeting
|
||||
#> <chr> <glue>
|
||||
#> 1 Flora {Hi Flora!}
|
||||
#> 2 David {Hi David!}
|
||||
#> 3 Terra {Hi Terra!}</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="str_flatten" data-type="sect2">
|
||||
<h2>
|
||||
<code>str_flatten()</code>
|
||||
str_flatten()
|
||||
</h2>
|
||||
<p><code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code> and <code>glue()</code> work well with <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> because their output is the same length as their inputs. What if you want a function that works well with <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code>, i.e., something that always returns a single string? That’s the job of <code><a href="https://stringr.tidyverse.org/reference/str_flatten.html">str_flatten()</a></code><span data-type="footnote">The base R equivalent is <code><a href="https://rdrr.io/r/base/paste.html">paste()</a></code> used with the <code>collapse</code> argument.</span>: it takes a character vector and combines each element of the vector into a single string:</p>
|
||||
<div class="cell">
|
||||
@@ -244,7 +230,7 @@ df |>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="exercises-1" data-type="sect2">
|
||||
<section id="strings-exercises-1" data-type="sect2">
|
||||
<h2>
|
||||
Exercises</h2>
|
||||
<ol type="1"><li>
|
||||
@@ -598,7 +584,12 @@ Long strings</h2>
|
||||
<li><p><code>str_wrap(x, 30)</code> wraps a string introducing new lines so that each line is at most 30 characters (it doesn’t hyphenate, however, so any word longer than 30 characters will make a longer line)</p></li>
|
||||
</ul><p>The following code shows these functions in action with a made-up string:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">x <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
|
||||
<pre data-type="programlisting" data-code-language="r">x <- paste0(
|
||||
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod ",
|
||||
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim ",
|
||||
"veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea",
|
||||
"commodo consequat."
|
||||
)
|
||||
|
||||
str_view(str_trunc(x, 30))
|
||||
#> [1] │ Lorem ipsum dolor sit amet,...
|
||||
@@ -610,12 +601,12 @@ str_view(str_wrap(x, 30))
|
||||
#> │ magna aliqua. Ut enim ad
|
||||
#> │ minim veniam, quis nostrud
|
||||
#> │ exercitation ullamco laboris
|
||||
#> │ nisi ut aliquip ex ea commodo
|
||||
#> │ nisi ut aliquip ex eacommodo
|
||||
#> │ consequat.</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="exercises-2" data-type="sect2">
|
||||
<section id="strings-exercises-2" data-type="sect2">
|
||||
<h2>
|
||||
Exercises</h2>
|
||||
<ol type="1"><li>Use <code><a href="https://stringr.tidyverse.org/reference/str_length.html">str_length()</a></code> and <code><a href="https://stringr.tidyverse.org/reference/str_sub.html">str_sub()</a></code> to extract the middle letter from each baby name. What will you do if the string has an even number of characters?</li>
|
||||
@@ -734,7 +725,7 @@ str_sort(c("a", "c", "ch", "h", "z"), locale = "cs")
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section id="summary" data-type="sect1">
|
||||
<section id="strings-summary" data-type="sect1">
|
||||
<h1>
|
||||
Summary</h1>
|
||||
<p>In this chapter, you’ve learned about some of the power of the stringr package: how to create, combine, and extract strings, and about some of the challenges you might face with non-English strings. Now it’s time to learn one of the most important and powerful tools for working with strings: regular expressions. Regular expressions are a very concise but very expressive language for describing patterns within strings and are the topic of the next chapter.</p>
|
||||
|
||||
Reference in New Issue
Block a user